A code-mixed task-oriented dialog dataset for medical domain

被引：4

作者：

Dowlagar, Suman ^{[1
]}

Mamidi, Radhika ^{[1
]}

机构：

[1] Int Inst Informat Technol, Language Technol Res Ctr, Hyderabad 506002, Telangana, India

来源：

COMPUTER SPEECH AND LANGUAGE | 2023年 / 78卷

关键词：

Code-mixed; Dialog dataset; Medical domain; Task oriented; LANGUAGE; COMMUNICATION; NETWORKS; SYSTEMS;

D O I：

10.1016/j.csl.2022.101449

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the healthcare domain, medical and patient interactions form a crucial part of the diagnosis. Initially, the AI models developed for healthcare centered only on monolingual data. However, such models do not cater to the multilingual regions, where most conversations are Code-Mixed. We present the Code-Mixed Medical Task-Oriented Dialog Dataset to facilitate the research and development of Code-Mixed medical dialog systems. We analyzed the dataset using medical, conversational, and linguistic theories. The dataset contains 3005 Telugu-English Code-Mixed dialogs between patients and doctors with 29 k utterances covering ten specializations with an average code-mixing index (CMI) of 33.3%. We manually annotated the conversational dataset with intents and slot labels. We also present baselines to establish benchmarks on the dataset using existing state-of-the-art Natural Language Understanding (NLU) models. We improved the existing baselines using contextual ground truth intent labels and processing the slots as chunks. The data is made publically available.1

引用

页数：34

共 50 条

[1] TamilATIS: Dataset for Task-Oriented Dialog in Tamil
Ramaneswaran, S.
Vijay, Sanchit
Srinivasan, Kathiravan
[J]. PROCEEDINGS OF THE SECOND WORKSHOP ON SPEECH AND LANGUAGE TECHNOLOGIES FOR DRAVIDIAN LANGUAGES (DRAVIDIANLANGTECH 2022), 2022, : 25 - 32
[2] SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations
Kottur, Satwik
Moon, Seungwhan
Geramifard, Alborz
Damavandi, Babak
[J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 4903 - 4912
[3] Incremental Dialog Processing in a Task-Oriented Dialog
Ghigi, Fabrizio
Eskenazi, Maxine
Ines Torres, M.
Lee, Sungjin
[J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 308 - 312
[4] Combining Open Domain Question Answering with a Task-Oriented Dialog System
Nehring, Jan
Feldhus, Nils
Ahmed, Akhyar
Kaur, Harleen
[J]. 1ST WORKSHOP ON DOCUMENT-GROUNDED DIALOGUE AND CONVERSATIONAL QUESTION ANSWERING (DIALDOC 2021), 2021, : 38 - 45
[5] DS-TOD: Efficient Domain Specialization for Task-Oriented Dialog
Hung, Chia-Chien
Lauscher, Anne
Ponzetto, Simone Paolo
Glavas, Goran
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 891 - 904
[6] Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog
Takanobu, Ryuichi
Zhu, Hanlin
Huang, Minlie
[J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 100 - 110
[7] Paraphrase Augmented Task-Oriented Dialog Generation
Gao, Silin
Zhang, Yichi
Ou, Zhijian
Yu, Zhou
[J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 639 - 649
[8] SIMMC-VR: A Task-oriented Multimodal Dialog Dataset with Situated and Immersive VR Streams
Wu, Te-Lin
Kottur, Satwik
Madotto, Andrea
Azab, Mahmoud
Rodriguez, Pedro
Damavandi, Babak
Peng, Nanyun
Moon, Seungwhan
[J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 6273 - 6291
[9] Multi2WOZ: A Robust Multilingual Dataset and Conversational Pretraining for Task-Oriented Dialog
Hung, Chia-Chien
Lauscher, Anne
Vulic, Ivan
Ponzetto, Simone Paolo
Glavas, Goran
[J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3687 - 3703
[10] Cyberbullying Detection in Code-Mixed Languages: Dataset and Techniques
Maity, Krishanu
Saha, Sriparna
Bhattacharyya, Pushpak
[J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1692 - 1698

← 1 2 3 4 5 →