Multi3 WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems

被引：0

作者：

Hu, Songbo ^{[1
]}

Zhou, Han ^{[1
]}

Hergul, Mete ^{[1
]}

Gritta, Milan ^{[2
]}

Zhang, Guchun ^{[2
]}

Iacobacci, Ignacio ^{[2
]}

Vulic, Ivan ^{[1
]}

Korhonen, Anna ^{[1
]}

机构：

[1] Univ Cambridge, Language Technol Lab, Cambridge, England

[2] Huawei Noahs Ark Lab, London, England

来源：

TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS | 2023年 / 11卷

关键词：

66;

D O I：

10.1162/tacl_a_00609

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Creating high-quality annotated data for task-oriented dialog (ToD) is known to be notoriously difficult, and the challenges are amplified when the goal is to create equitable, culturally adapted, and large-scale ToD datasets for multiple languages. Therefore, the current datasets are still very scarce and suffer from limitations such as translation-based non-native dialogs with translation artefacts, small scale, or lack of cultural adaptation, among others. In this work, we first take stock of the current landscape of multilingual ToD datasets, offering a systematic overview of their properties and limitations. Aiming to reduce all the detected limitations, we then introduce Multi(3)WOZ, a novel multilingual, multi-domain, multi-parallel ToD dataset. It is large-scale and offers culturally adapted dialogs in 4 languages to enable training and evaluation of multilingual and cross-lingual ToD systems. We describe a complex bottom-up data collection process that yielded the final dataset, and offer the first sets of baseline scores across different ToD-related tasks for future reference, also highlighting its challenging nature.

引用

页码：1396 / 1415

页数：20

共 21 条

[1] Multi2WOZ: A Robust Multilingual Dataset and Conversational Pretraining for Task-Oriented Dialog
Hung, Chia-Chien
Lauscher, Anne
Vulic, Ivan
Ponzetto, Simone Paolo
Glavas, Goran
[J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3687 - 3703
[2] Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog
Takanobu, Ryuichi
Zhu, Hanlin
Huang, Minlie
[J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 100 - 110
[3] MatDC: A Multi-turn Multi-domain Annotated Task-oriented Dialogue Dataset in Chinese
Tseng, Yu-Hsiang
Hsieh, Shu-Kai
Lian, Richard
Chiang, Chiung-Yu
Chang, Yu-Lin
Chang, Li-Ping
Hsieh, Ji-Lung
[J]. 2020 25TH INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI 2020), 2020, : 165 - 170
[4] Domain-Specific Multi-Agent Dialog Policy Learning in Multi-Domain Task-Oriented Scenarios
Tang, Li
Si, Yuke
Wang, Longbiao
Dang, Jianwu
[J]. INTERSPEECH 2021, 2021, : 256 - 260
[5] Multi-task learning with graph attention networks for multi-domain task-oriented dialogue systems
Zhao, Meng
Wang, Lifang
Jiang, Zejun
Li, Ronghan
Lu, Xinyu
Hu, Zhongtian
[J]. KNOWLEDGE-BASED SYSTEMS, 2023, 259
[6] Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems
Wu, Chien-Sheng
Madotto, Andrea
Hosseini-Asl, Ehsan
Xiong, Caiming
Socher, Richard
Fung, Pascale
[J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 808 - 819
[7] A Large Multilingual and Multi-domain Dataset for Recommender Systems
Di Tommaso, Giorgia
Faralli, Stefano
Velardi, Paola
[J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2806 - 2813
[8] Dialogue summarization enhanced response generation for multi-domain task-oriented dialogue systems
Wang, Lifang
Zhao, Meng
Ji, Hongru
Jiang, Zejun
Li, Ronghan
Hu, Zhongtian
Lu, Xinyu
[J]. INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (03)
[9] Advances and Challenges in Multi-Domain Task-Oriented Dialogue Policy Optimization
Rohmatillah, Mahdin
Chien, Jen-Tzung
[J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2023, 12 (01)
[10] SUMBT plus LaRL: Effective Multi-Domain End-to-End Neural Task-Oriented Dialog System
Lee, Hwaran
Jo, Seokhwan
Kim, Hyungjun
Jung, Sangkeun
Kim, Tae-Yoon
[J]. IEEE ACCESS, 2021, 9 : 116133 - 116146

← 1 2 3 →