A Multilingual and Multidomain Study on Dialog Act Recognition Using Character-Level Tokenization

被引：10

作者：

Ribeiro, Eugenio ^{[1
,2
]}

Ribeiro, Ricardo ^{[1
,3
]}

de Matos, David Martins ^{[1
,2
]}

机构：

[1] INESC ID, Spoken Language Syst Lab L2F, P-1000029 Lisbon, Portugal

[2] Univ Lisbon, Inst Super Tecn, P-1049001 Lisbon, Portugal

[3] Inst Univ Lisboa ISCTE IUL, P-1649026 Lisbon, Portugal

来源：

INFORMATION | 2019年 / 10卷 / 03期

关键词：

dialog act recognition; character-level; multilinguality; multidomain;

D O I：

10.3390/info10030094

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Automatic dialog act recognition is an important step for dialog systems since it reveals the intention behind the words uttered by its conversational partners. Although most approaches on the task use word-level tokenization, there is information at the sub-word level that is related to the function of the words and, consequently, their intention. Thus, in this study, we explored the use of character-level tokenization to capture that information. We explored the use of multiple character windows of different sizes to capture morphological aspects, such as affixes and lemmas, as well as inter-word information. Furthermore, we assessed the importance of punctuation and capitalization for the task. To broaden the conclusions of our study, we performed experiments on dialogs in three languages-English, Spanish, and German-which have different morphological characteristics. Furthermore, the dialogs cover multiple domains and are annotated with both domain-dependent and domain-independent dialog act labels. The achieved results not only show that the character-level approach leads to similar or better performance than the state-of-the-art word-level approaches on the task, but also that both approaches are able to capture complementary information. Thus, the best results are achieved by combining tokenization at both levels.

引用

页数：19

共 50 条

[31] Word Game Modeling Using Character-Level N-Gram and Statistics
Mattiev, Jamolbek
Salaev, Ulugbek
Kavsek, Branko
MATHEMATICS, 2023, 11 (06)
[32] Malicious and Benign URL Dataset Generation Using Character-Level LSTM Models
Vecile, Spencer
Lacroix, Kyle
Grolinger, Katarina
Samarabandu, Jagath
2022 5TH IEEE CONFERENCE ON DEPENDABLE AND SECURE COMPUTING (IEEE DSC 2022), 2022,
[33] Named-Entity Recognition in Sports Field Based on a Character-Level Graph Convolutional Network
Seti, Xieraili
Wumaier, Aishan
Yibulayin, Turgen
Paerhati, Diliyaer
Wang, Lulu
Saimaiti, Alimu
INFORMATION, 2020, 11 (01)
[34] Improving Named Entity Recognition in Vietnamese Texts by a Character-Level Deep Lifelong Learning Model
Ngoc-Vu Nguyen
Thi-Lan Nguyen
Cam-Van Nguyen Thi
Mai-Vu Tran
Tri-Thanh Nguyen
Quang-Thuy Ha
VIETNAM JOURNAL OF COMPUTER SCIENCE, 2019, 6 (04) : 471 - 487
[35] Multilingual POS tagging by a composite deep architecture based on character-level features and on-the-fly enriched Word Embeddings
Pota, Marco
Marulli, Fiammetta
Esposito, Massimo
De Pietro, Giuseppe
Fujita, Hamido
KNOWLEDGE-BASED SYSTEMS, 2019, 164 : 309 - 323
[36] Text steganography: a novel character-level embedding algorithm using font attribute
Ramakrishnan, Bala Krishnan
Thandra, Prasanth Kumar
Srinivasula, A. V. Satya Murty
SECURITY AND COMMUNICATION NETWORKS, 2016, 9 (18) : 6066 - 6079
[37] Performance evaluation of character-level CNNs using tweet data and analysis for weight perturbations
Miyazaki, Kazuteru
Ida, Masaaki
ARTIFICIAL LIFE AND ROBOTICS, 2024, 29 (02) : 266 - 273
[38] Character-Level Dialect Identification in Arabic Using Long Short-Term Memory
Sayadi, Karim
Hamidi, Mansour
Bui, Marc
Liwicki, Marcus
Fischer, Andreas
COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2017, PT II, 2018, 10762 : 324 - 337
[39] Automatically Classifying Chinese Judgment Documents Using Character-Level Convolutional Neural Networks
Zhou, Xiaosong
Li, Chuanyi
Ge, Jidong
Li, Zhongjin
Zhou, Xiaoyu
Luo, Bin
PRICAI 2018: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2018, 11013 : 430 - 437
[40] Consistency Assessment between Diploma Policy and Curriculum Policy using Character-level CNN
Miyazaki, Kazuteru
Ida, Masaaki
2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 626 - 631

← 1 2 3 4 5 →