A Multilingual and Multidomain Study on Dialog Act Recognition Using Character-Level Tokenization

被引:10
|
作者
Ribeiro, Eugenio [1 ,2 ]
Ribeiro, Ricardo [1 ,3 ]
de Matos, David Martins [1 ,2 ]
机构
[1] INESC ID, Spoken Language Syst Lab L2F, P-1000029 Lisbon, Portugal
[2] Univ Lisbon, Inst Super Tecn, P-1049001 Lisbon, Portugal
[3] Inst Univ Lisboa ISCTE IUL, P-1649026 Lisbon, Portugal
关键词
dialog act recognition; character-level; multilinguality; multidomain;
D O I
10.3390/info10030094
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic dialog act recognition is an important step for dialog systems since it reveals the intention behind the words uttered by its conversational partners. Although most approaches on the task use word-level tokenization, there is information at the sub-word level that is related to the function of the words and, consequently, their intention. Thus, in this study, we explored the use of character-level tokenization to capture that information. We explored the use of multiple character windows of different sizes to capture morphological aspects, such as affixes and lemmas, as well as inter-word information. Furthermore, we assessed the importance of punctuation and capitalization for the task. To broaden the conclusions of our study, we performed experiments on dialogs in three languages-English, Spanish, and German-which have different morphological characteristics. Furthermore, the dialogs cover multiple domains and are annotated with both domain-dependent and domain-independent dialog act labels. The achieved results not only show that the character-level approach leads to similar or better performance than the state-of-the-art word-level approaches on the task, but also that both approaches are able to capture complementary information. Thus, the best results are achieved by combining tokenization at both levels.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching
    Hou, Wenxin
    Wang, Jindong
    Tan, Xu
    Qin, Tao
    Shinozaki, Takahiro
    INTERSPEECH 2021, 2021, : 3425 - 3429
  • [22] Chinese Named Entity Recognition with Character-Level BLSTM and Soft Attention Model
    Yin J.
    Luo S.
    Wu Z.
    Pan L.
    Journal of Beijing Institute of Technology (English Edition), 2020, 29 (01): : 60 - 71
  • [23] A Character-Level Restoration of Sukhothai Inscriptions Using The Masked Language Model
    Tongkhum, Sujitra
    Sinthupinyo, Sukree
    2023 18TH INTERNATIONAL JOINT SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING, ISAI-NLP, 2023,
  • [24] Deep Learning Speech Synthesis Model for Word/Character-Level Recognition in the Tamil Language
    Rajendran, Sukumar
    Raja, Kiruba Thangam
    Nagarajan, G.
    Dass, A. Stephen
    Kumar, M. Sandeep
    Jayagopal, Prabhu
    INTERNATIONAL JOURNAL OF E-COLLABORATION, 2023, 19 (04) : 20 - 20
  • [25] SanskritWord Segmentation Using Character-level Recurrent and Convolutional Neural Networks
    Helwig, Oliver
    Nehrdich, Sebastian
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2754 - 2763
  • [26] Web Application Firewall using Character-level Convolutional Neural Network
    Ito, Michiaki
    Iyatomi, Hitoshi
    2018 IEEE 14TH INTERNATIONAL COLLOQUIUM ON SIGNAL PROCESSING & ITS APPLICATIONS (CSPA 2018), 2018, : 103 - 106
  • [27] A Character-Level Deep Lifelong Learning Model for Named Entity Recognition in Vietnamese Text
    Ngoc-Vu Nguyen
    Thi-Lan Nguyen
    Cam-Van Nguyen Thi
    Mai-Vu Tran
    Quang-Thuy Ha
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2019, PT I, 2019, 11431 : 90 - 102
  • [28] Handwritten numeral string recognition: Character-level vs. string-level classifier training
    Liu, CL
    Marukawa, K
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, 2004, : 405 - 408
  • [29] ArCAR: A Novel Deep Learning Computer-Aided Recognition for Character-Level Arabic Text Representation and Recognition
    Muaad, Abdullah Y.
    Jayappa, Hanumanthappa
    Al-antari, Mugahed A.
    Lee, Sungyoung
    ALGORITHMS, 2021, 14 (07)
  • [30] Joint dialog act segmentation and recognition in human conversations using attention to dialog context
    Zhao, Tianyu
    Kawahara, Tatsuya
    COMPUTER SPEECH AND LANGUAGE, 2019, 57 : 108 - 127