A Multilingual and Multidomain Study on Dialog Act Recognition Using Character-Level Tokenization

被引:10
|
作者
Ribeiro, Eugenio [1 ,2 ]
Ribeiro, Ricardo [1 ,3 ]
de Matos, David Martins [1 ,2 ]
机构
[1] INESC ID, Spoken Language Syst Lab L2F, P-1000029 Lisbon, Portugal
[2] Univ Lisbon, Inst Super Tecn, P-1049001 Lisbon, Portugal
[3] Inst Univ Lisboa ISCTE IUL, P-1649026 Lisbon, Portugal
关键词
dialog act recognition; character-level; multilinguality; multidomain;
D O I
10.3390/info10030094
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic dialog act recognition is an important step for dialog systems since it reveals the intention behind the words uttered by its conversational partners. Although most approaches on the task use word-level tokenization, there is information at the sub-word level that is related to the function of the words and, consequently, their intention. Thus, in this study, we explored the use of character-level tokenization to capture that information. We explored the use of multiple character windows of different sizes to capture morphological aspects, such as affixes and lemmas, as well as inter-word information. Furthermore, we assessed the importance of punctuation and capitalization for the task. To broaden the conclusions of our study, we performed experiments on dialogs in three languages-English, Spanish, and German-which have different morphological characteristics. Furthermore, the dialogs cover multiple domains and are annotated with both domain-dependent and domain-independent dialog act labels. The achieved results not only show that the character-level approach leads to similar or better performance than the state-of-the-art word-level approaches on the task, but also that both approaches are able to capture complementary information. Thus, the best results are achieved by combining tokenization at both levels.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] Dialog-Act Recognition Using Discourse and Sentence Structure Information
    Zhou, Keyan
    Zong, Chengqing
    2009 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2009, : 11 - 16
  • [42] Offensive Sentence Classification Using Character-Level CNN and Transfer Learning with Fake Sentences
    Seo, Suin
    Cho, Sung-Bea
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT II, 2017, 10635 : 532 - 539
  • [43] Chinese Q&A Community Medical Entity Recognition with Character-Level Features and Self-Attention Mechanism
    Han, Pu
    Zhang, Mingtao
    Shi, Jin
    Yang, Jinming
    Li, Xiaoyan
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2021, 29 (01): : 55 - 72
  • [44] Chinese Character-level Writer Identification using Path Signature Feature, DropStroke and Deep CNN
    Yang, Weixin
    Jin, Lianwen
    Liu, Manfei
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 546 - 550
  • [45] Construction of consistency judgment system of diploma policy and curriculum policy using character-level cnn
    Miyazaki K.
    Ida M.
    IEEJ Transactions on Electronics, Information and Systems, 2019, 139 (10) : 1119 - 1127
  • [46] Applying a Character-Level Model to a Short Arabic Dialect Sentence: A Saudi Dialect as a Case Study
    Alqurashi, Tahani
    APPLIED SCIENCES-BASEL, 2022, 12 (23):
  • [47] Categorization of free-text drug orders using character-level recurrent neural networks
    Raiskin, Yarden
    Eickhoff, Carsten
    Beeler, Patrick E.
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2019, 129 : 20 - 28
  • [48] Construction of consistency judgment system of diploma policy and curriculum policy using character-level CNN
    Miyazaki, Kazuteru
    Ida, Masaaki
    ELECTRONICS AND COMMUNICATIONS IN JAPAN, 2019, 102 (12) : 30 - 39
  • [49] End-to-End Recurrent Neural Network Models for Vietnamese Named Entity Recognition: Word-Level Vs. Character-Level
    Thai-Hoang Pham
    Phuong Le-Hong
    COMPUTATIONAL LINGUISTICS, PACLING 2017, 2018, 781 : 219 - 232
  • [50] Deep Dialog Act Recognition using Multiple Token, Segment, and Context Information Representations
    Ribeiro, Eugenio
    Ribeiro, Ricardo
    de Matos, David Martins
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2019, 66 : 861 - 899