A code-mixed task-oriented dialog dataset for medical domain

被引:4
|
作者
Dowlagar, Suman [1 ]
Mamidi, Radhika [1 ]
机构
[1] Int Inst Informat Technol, Language Technol Res Ctr, Hyderabad 506002, Telangana, India
来源
关键词
Code-mixed; Dialog dataset; Medical domain; Task oriented; LANGUAGE; COMMUNICATION; NETWORKS; SYSTEMS;
D O I
10.1016/j.csl.2022.101449
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the healthcare domain, medical and patient interactions form a crucial part of the diagnosis. Initially, the AI models developed for healthcare centered only on monolingual data. However, such models do not cater to the multilingual regions, where most conversations are Code-Mixed. We present the Code-Mixed Medical Task-Oriented Dialog Dataset to facilitate the research and development of Code-Mixed medical dialog systems. We analyzed the dataset using medical, conversational, and linguistic theories. The dataset contains 3005 Telugu-English Code-Mixed dialogs between patients and doctors with 29 k utterances covering ten specializations with an average code-mixing index (CMI) of 33.3%. We manually annotated the conversational dataset with intents and slot labels. We also present baselines to establish benchmarks on the dataset using existing state-of-the-art Natural Language Understanding (NLU) models. We improved the existing baselines using contextual ground truth intent labels and processing the slots as chunks. The data is made publically available.1
引用
收藏
页数:34
相关论文
共 50 条
  • [1] TamilATIS: Dataset for Task-Oriented Dialog in Tamil
    Ramaneswaran, S.
    Vijay, Sanchit
    Srinivasan, Kathiravan
    [J]. PROCEEDINGS OF THE SECOND WORKSHOP ON SPEECH AND LANGUAGE TECHNOLOGIES FOR DRAVIDIAN LANGUAGES (DRAVIDIANLANGTECH 2022), 2022, : 25 - 32
  • [2] SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations
    Kottur, Satwik
    Moon, Seungwhan
    Geramifard, Alborz
    Damavandi, Babak
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 4903 - 4912
  • [3] Incremental Dialog Processing in a Task-Oriented Dialog
    Ghigi, Fabrizio
    Eskenazi, Maxine
    Ines Torres, M.
    Lee, Sungjin
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 308 - 312
  • [4] Combining Open Domain Question Answering with a Task-Oriented Dialog System
    Nehring, Jan
    Feldhus, Nils
    Ahmed, Akhyar
    Kaur, Harleen
    [J]. 1ST WORKSHOP ON DOCUMENT-GROUNDED DIALOGUE AND CONVERSATIONAL QUESTION ANSWERING (DIALDOC 2021), 2021, : 38 - 45
  • [5] DS-TOD: Efficient Domain Specialization for Task-Oriented Dialog
    Hung, Chia-Chien
    Lauscher, Anne
    Ponzetto, Simone Paolo
    Glavas, Goran
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 891 - 904
  • [6] Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog
    Takanobu, Ryuichi
    Zhu, Hanlin
    Huang, Minlie
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 100 - 110
  • [7] Paraphrase Augmented Task-Oriented Dialog Generation
    Gao, Silin
    Zhang, Yichi
    Ou, Zhijian
    Yu, Zhou
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 639 - 649
  • [8] SIMMC-VR: A Task-oriented Multimodal Dialog Dataset with Situated and Immersive VR Streams
    Wu, Te-Lin
    Kottur, Satwik
    Madotto, Andrea
    Azab, Mahmoud
    Rodriguez, Pedro
    Damavandi, Babak
    Peng, Nanyun
    Moon, Seungwhan
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 6273 - 6291
  • [9] Multi2WOZ: A Robust Multilingual Dataset and Conversational Pretraining for Task-Oriented Dialog
    Hung, Chia-Chien
    Lauscher, Anne
    Vulic, Ivan
    Ponzetto, Simone Paolo
    Glavas, Goran
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3687 - 3703
  • [10] Cyberbullying Detection in Code-Mixed Languages: Dataset and Techniques
    Maity, Krishanu
    Saha, Sriparna
    Bhattacharyya, Pushpak
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1692 - 1698