A Multilingual and Multidomain Study on Dialog Act Recognition Using Character-Level Tokenization

被引:10
|
作者
Ribeiro, Eugenio [1 ,2 ]
Ribeiro, Ricardo [1 ,3 ]
de Matos, David Martins [1 ,2 ]
机构
[1] INESC ID, Spoken Language Syst Lab L2F, P-1000029 Lisbon, Portugal
[2] Univ Lisbon, Inst Super Tecn, P-1049001 Lisbon, Portugal
[3] Inst Univ Lisboa ISCTE IUL, P-1649026 Lisbon, Portugal
关键词
dialog act recognition; character-level; multilinguality; multidomain;
D O I
10.3390/info10030094
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic dialog act recognition is an important step for dialog systems since it reveals the intention behind the words uttered by its conversational partners. Although most approaches on the task use word-level tokenization, there is information at the sub-word level that is related to the function of the words and, consequently, their intention. Thus, in this study, we explored the use of character-level tokenization to capture that information. We explored the use of multiple character windows of different sizes to capture morphological aspects, such as affixes and lemmas, as well as inter-word information. Furthermore, we assessed the importance of punctuation and capitalization for the task. To broaden the conclusions of our study, we performed experiments on dialogs in three languages-English, Spanish, and German-which have different morphological characteristics. Furthermore, the dialogs cover multiple domains and are annotated with both domain-dependent and domain-independent dialog act labels. The achieved results not only show that the character-level approach leads to similar or better performance than the state-of-the-art word-level approaches on the task, but also that both approaches are able to capture complementary information. Thus, the best results are achieved by combining tokenization at both levels.
引用
收藏
页数:19
相关论文
共 50 条
  • [11] Character-level neural network for biomedical named entity recognition
    Gridach, Mourad
    JOURNAL OF BIOMEDICAL INFORMATICS, 2017, 70 : 85 - 91
  • [12] OPEN VOCABULARY HANDWRITING RECOGNITION USING COMBINED WORD-LEVEL AND CHARACTER-LEVEL LANGUAGE MODELS
    Kozielski, Michal
    Rybach, David
    Hahn, Stefan
    Schlueter, Ralf
    Ney, Hermann
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8257 - 8261
  • [13] A Novel Joint Character Categorization and Localization Approach for Character-Level Scene Text Recognition
    Qi, Xianbiao
    Chen, Yihao
    Xiao, Rong
    Li, Chun-Guang
    Zou, Qin
    Cui, Shuguang
    2019 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION WORKSHOPS (ICDARW), VOL 5, 2019, : 83 - 90
  • [14] A Metaverse text recognition model based on character-level contrastive learning
    Sun, Le
    Li, Huiyun
    Muhammad, Ghulam
    APPLIED SOFT COMPUTING, 2023, 149
  • [15] Using Character-Level Sequence-to-Sequence Model for Word Level Text Generation to Enhance Arabic Speech Recognition
    Azim, Mona A.
    Hussein, Wedad
    Badr, Nagwa L.
    IEEE ACCESS, 2023, 11 : 91173 - 91183
  • [16] Comparison of character-level and part of speech features for name recognition in biomedical texts
    Collier, N
    Takeuchi, K
    JOURNAL OF BIOMEDICAL INFORMATICS, 2004, 37 (06) : 423 - 435
  • [17] Chinese Named Entity Recognition with Character-Level BLSTM and Soft Attention Model
    Jize Yin
    Senlin Luo
    Zhouting Wu
    Limin Pan
    Journal of Beijing Institute of Technology, 2020, 29 (01) : 60 - 71
  • [18] CharCaps: Character-Level Text Classification Using Capsule Networks
    Wu, Yujia
    Guo, Xin
    Zhan, Kangning
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT II, 2023, 14087 : 187 - 198
  • [19] Character-Level Alignment Using WFST and LSTM for Post-processing in Multi-script Recognition Systems - A Comparative Study
    Al Azawi, Mayce
    Ul Hasan, Adnan
    Liwicki, Marcus
    Breuel, Thomas M.
    IMAGE ANALYSIS AND RECOGNITION, ICIAR 2014, PT I, 2014, 8814 : 379 - 386
  • [20] Evolving Character-Level DenseNet Architectures Using Genetic Programming
    Londt, Trevor
    Gao, Xiaoying
    Andreae, Peter
    APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2021, 2021, 12694 : 665 - 680