A Multilingual and Multidomain Study on Dialog Act Recognition Using Character-Level Tokenization

被引:10
|
作者
Ribeiro, Eugenio [1 ,2 ]
Ribeiro, Ricardo [1 ,3 ]
de Matos, David Martins [1 ,2 ]
机构
[1] INESC ID, Spoken Language Syst Lab L2F, P-1000029 Lisbon, Portugal
[2] Univ Lisbon, Inst Super Tecn, P-1049001 Lisbon, Portugal
[3] Inst Univ Lisboa ISCTE IUL, P-1649026 Lisbon, Portugal
关键词
dialog act recognition; character-level; multilinguality; multidomain;
D O I
10.3390/info10030094
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic dialog act recognition is an important step for dialog systems since it reveals the intention behind the words uttered by its conversational partners. Although most approaches on the task use word-level tokenization, there is information at the sub-word level that is related to the function of the words and, consequently, their intention. Thus, in this study, we explored the use of character-level tokenization to capture that information. We explored the use of multiple character windows of different sizes to capture morphological aspects, such as affixes and lemmas, as well as inter-word information. Furthermore, we assessed the importance of punctuation and capitalization for the task. To broaden the conclusions of our study, we performed experiments on dialogs in three languages-English, Spanish, and German-which have different morphological characteristics. Furthermore, the dialogs cover multiple domains and are annotated with both domain-dependent and domain-independent dialog act labels. The achieved results not only show that the character-level approach leads to similar or better performance than the state-of-the-art word-level approaches on the task, but also that both approaches are able to capture complementary information. Thus, the best results are achieved by combining tokenization at both levels.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] Word Game Modeling Using Character-Level N-Gram and Statistics
    Mattiev, Jamolbek
    Salaev, Ulugbek
    Kavsek, Branko
    MATHEMATICS, 2023, 11 (06)
  • [32] Malicious and Benign URL Dataset Generation Using Character-Level LSTM Models
    Vecile, Spencer
    Lacroix, Kyle
    Grolinger, Katarina
    Samarabandu, Jagath
    2022 5TH IEEE CONFERENCE ON DEPENDABLE AND SECURE COMPUTING (IEEE DSC 2022), 2022,
  • [33] Named-Entity Recognition in Sports Field Based on a Character-Level Graph Convolutional Network
    Seti, Xieraili
    Wumaier, Aishan
    Yibulayin, Turgen
    Paerhati, Diliyaer
    Wang, Lulu
    Saimaiti, Alimu
    INFORMATION, 2020, 11 (01)
  • [34] Improving Named Entity Recognition in Vietnamese Texts by a Character-Level Deep Lifelong Learning Model
    Ngoc-Vu Nguyen
    Thi-Lan Nguyen
    Cam-Van Nguyen Thi
    Mai-Vu Tran
    Tri-Thanh Nguyen
    Quang-Thuy Ha
    VIETNAM JOURNAL OF COMPUTER SCIENCE, 2019, 6 (04) : 471 - 487
  • [35] Multilingual POS tagging by a composite deep architecture based on character-level features and on-the-fly enriched Word Embeddings
    Pota, Marco
    Marulli, Fiammetta
    Esposito, Massimo
    De Pietro, Giuseppe
    Fujita, Hamido
    KNOWLEDGE-BASED SYSTEMS, 2019, 164 : 309 - 323
  • [36] Text steganography: a novel character-level embedding algorithm using font attribute
    Ramakrishnan, Bala Krishnan
    Thandra, Prasanth Kumar
    Srinivasula, A. V. Satya Murty
    SECURITY AND COMMUNICATION NETWORKS, 2016, 9 (18) : 6066 - 6079
  • [37] Performance evaluation of character-level CNNs using tweet data and analysis for weight perturbations
    Miyazaki, Kazuteru
    Ida, Masaaki
    ARTIFICIAL LIFE AND ROBOTICS, 2024, 29 (02) : 266 - 273
  • [38] Character-Level Dialect Identification in Arabic Using Long Short-Term Memory
    Sayadi, Karim
    Hamidi, Mansour
    Bui, Marc
    Liwicki, Marcus
    Fischer, Andreas
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2017, PT II, 2018, 10762 : 324 - 337
  • [39] Automatically Classifying Chinese Judgment Documents Using Character-Level Convolutional Neural Networks
    Zhou, Xiaosong
    Li, Chuanyi
    Ge, Jidong
    Li, Zhongjin
    Zhou, Xiaoyu
    Luo, Bin
    PRICAI 2018: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2018, 11013 : 430 - 437
  • [40] Consistency Assessment between Diploma Policy and Curriculum Policy using Character-level CNN
    Miyazaki, Kazuteru
    Ida, Masaaki
    2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 626 - 631