A Multilingual and Multidomain Study on Dialog Act Recognition Using Character-Level Tokenization

被引:10
|
作者
Ribeiro, Eugenio [1 ,2 ]
Ribeiro, Ricardo [1 ,3 ]
de Matos, David Martins [1 ,2 ]
机构
[1] INESC ID, Spoken Language Syst Lab L2F, P-1000029 Lisbon, Portugal
[2] Univ Lisbon, Inst Super Tecn, P-1049001 Lisbon, Portugal
[3] Inst Univ Lisboa ISCTE IUL, P-1649026 Lisbon, Portugal
关键词
dialog act recognition; character-level; multilinguality; multidomain;
D O I
10.3390/info10030094
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic dialog act recognition is an important step for dialog systems since it reveals the intention behind the words uttered by its conversational partners. Although most approaches on the task use word-level tokenization, there is information at the sub-word level that is related to the function of the words and, consequently, their intention. Thus, in this study, we explored the use of character-level tokenization to capture that information. We explored the use of multiple character windows of different sizes to capture morphological aspects, such as affixes and lemmas, as well as inter-word information. Furthermore, we assessed the importance of punctuation and capitalization for the task. To broaden the conclusions of our study, we performed experiments on dialogs in three languages-English, Spanish, and German-which have different morphological characteristics. Furthermore, the dialogs cover multiple domains and are annotated with both domain-dependent and domain-independent dialog act labels. The achieved results not only show that the character-level approach leads to similar or better performance than the state-of-the-art word-level approaches on the task, but also that both approaches are able to capture complementary information. Thus, the best results are achieved by combining tokenization at both levels.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] A Study on Dialog Act Recognition Using Character-Level Tokenization
    Ribeiro, Eugenio
    Ribeiro, Ricardo
    de Matos, David Martins
    ARTIFICIAL INTELLIGENCE: METHODOLOGY, SYSTEMS, AND APPLICATIONS, AIMSA 2018, 2018, 11089 : 93 - 103
  • [2] Character-level convolutional networks for arithmetic operator character recognition
    Liang, Zhijie
    Li, Qing
    Liao, Shengbin
    FIFTH INTERNATIONAL CONFERENCE ON EDUCATIONAL INNOVATION THROUGH TECHNOLOGY (EITT 2016), 2016, : 208 - 212
  • [3] Evaluating corpora for named entity recognition using character-level features
    Whitelaw, C
    Patrick, J
    AI 2003: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2003, 2903 : 910 - 921
  • [4] Crowdsourcing the character of a place: Character-level convolutional networks for multilingual geographic text classification
    Adams, Benjamin
    McKenzie, Grant
    TRANSACTIONS IN GIS, 2018, 22 (02) : 394 - 408
  • [5] Character-Level Neural Translation for Multilingual Media Monitoring in the SUMMA Project
    Barzdins, Guntis
    Renals, Steve
    Gosko, Didzis
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 1789 - 1793
  • [6] A New Generation of Perspective API: Efficient Multilingual Character-level Transformers
    Lees, Alyssa
    Tran, Vinh Q.
    Tay, Yi
    Sorensen, Jeffrey
    Gupta, Jai
    Metzler, Donald
    Vasserman, Lucy
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 3197 - 3207
  • [7] CHARACTER-LEVEL EMBEDDING USING FASTTEXT AND LSTM FOR BIOMEDICAL NAMED ENTITY RECOGNITION
    Al-Jumaili, Ahmed Sabah Ahmed
    Tayyeh, Huda Kadhim
    SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2024, 25 (06): : 5258 - 5264
  • [8] 280 Birds with One Stone: Inducing Multilingual Taxonomies from Wikipedia Using Character-Level Classification
    Gupta, Amit
    Lebret, Remi
    Harkous, Hamza
    Aberer, Karl
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 4824 - 4831
  • [9] CHARACTER-LEVEL INCREMENTAL SPEECH RECOGNITION WITH RECURRENT NEURAL NETWORKS
    Hwang, Kyuyeon
    Sung, Wonyong
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5335 - 5339
  • [10] CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks
    Baek, Youngmin
    Nam, Daehyun
    Park, Sungrae
    Lee, Junyeop
    Shin, Seung
    Baek, Jeonghun
    Lee, Chae Young
    Lee, Hwalsuk
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 2404 - 2412