Punctuation Prediction for Vietnamese Texts Using Conditional Random Fields

被引:2
|
作者
Pham, Quang H. [1 ]
Nguyen, Binh T. [2 ]
Nguyen Viet Cuong [3 ]
机构
[1] Singapore Management Univ, Sch Informat Syst, Singapore, Singapore
[2] VNU HCM Univ Sci, AISIA Res Lab, Ho Chi Minh City, Vietnam
[3] Univ Cambridge, Dept Engn, Cambridge, England
关键词
punctuation prediction; Vietnamese language; conditional random field; sequence labeling;
D O I
10.1145/3368926.3369716
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We investigate the punctuation prediction for the Vietnamese language. This problem is crucial as it can be used to add suitable punctuation marks to machine-transcribed speeches, which usually do not have such information. Similar to previous works for English and Chinese languages, we formulate this task as a sequence labeling problem. After that, we apply the conditional random field model for solving the problem and propose a set of appropriate features that are useful for prediction. Moreover, we build two corpora from Vietnamese online news and movie subtitles and perform extensive experiments on these data. Finally, we ask four volunteers to insert punctuations into a small sample of our dataset. The experimental results show that this problem is challenging, even for a human, and our model can achieve near performance in comparison to a human.
引用
收藏
页码:322 / 327
页数:6
相关论文
共 50 条
  • [31] Context Based Pedestrian Intention Prediction using Factored Latent Dynamic Conditional Random Fields
    Neogi, Satyajit
    Hoy, Michael
    Weng Chaoqun
    Dauwels, Justin
    [J]. 2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017,
  • [32] Classifying Behavioral Attributes Using Conditional Random Fields
    Vrigkas, Michalis
    Nikou, Christophoros
    Kakadiadis, Ioannis A.
    [J]. ARTIFICIAL INTELLIGENCE: METHODS AND APPLICATIONS, 2014, 8445 : 95 - 104
  • [33] Clause boundary identification using conditional random fields
    Ram, R. Vijay Sundar
    Devi, Sobha Lalitha
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2008, 4919 : 140 - 150
  • [34] Handwritten word recognition using conditional random fields
    Shetty, Shravya
    Srinivasan, Harish
    Srihari, Sargur
    [J]. ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 1098 - 1102
  • [35] Assessing map quality using conditional random fields
    Chandran-Ramesh, Manjari
    Newman, Paul
    [J]. FIELD AND SERVICE ROBOTICS: RESULTS OF THE 6TH INTERNATIONAL CONFERENCE, 2008, 42 : 35 - 48
  • [36] Detecting DDoS Attacks Using Conditional Random Fields
    Liu, Yun
    Jiang, Siyu
    Yuan, Xiaojie
    [J]. APPLIED SCIENCE, MATERIALS SCIENCE AND INFORMATION TECHNOLOGIES IN INDUSTRY, 2014, 513-517 : 522 - 526
  • [37] Named Entity Recognition using Conditional Random Fields
    Patil, Nita
    Patil, Ajay
    Pawar, B., V
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 : 1181 - 1188
  • [38] Combining phonetic attributes using conditional random fields
    Morris, Jeremy
    Fosler-Lussier, Eric
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 597 - 600
  • [39] Named Entity Recognition Using Conditional Random Fields
    Khan, Wahab
    Daud, Ali
    Shahzad, Khurram
    Amjad, Tehmina
    Banjar, Ameen
    Fasihuddin, Heba
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (13):
  • [40] Speech Recognition Using Augmented Conditional Random Fields
    Hifny, Yasser
    Renals, Steve
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (02): : 354 - 365