Punctuation Prediction for Vietnamese Texts Using Conditional Random Fields

被引:2
|
作者
Pham, Quang H. [1 ]
Nguyen, Binh T. [2 ]
Nguyen Viet Cuong [3 ]
机构
[1] Singapore Management Univ, Sch Informat Syst, Singapore, Singapore
[2] VNU HCM Univ Sci, AISIA Res Lab, Ho Chi Minh City, Vietnam
[3] Univ Cambridge, Dept Engn, Cambridge, England
关键词
punctuation prediction; Vietnamese language; conditional random field; sequence labeling;
D O I
10.1145/3368926.3369716
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We investigate the punctuation prediction for the Vietnamese language. This problem is crucial as it can be used to add suitable punctuation marks to machine-transcribed speeches, which usually do not have such information. Similar to previous works for English and Chinese languages, we formulate this task as a sequence labeling problem. After that, we apply the conditional random field model for solving the problem and propose a set of appropriate features that are useful for prediction. Moreover, we build two corpora from Vietnamese online news and movie subtitles and perform extensive experiments on these data. Finally, we ask four volunteers to insert punctuations into a small sample of our dataset. The experimental results show that this problem is challenging, even for a human, and our model can achieve near performance in comparison to a human.
引用
收藏
页码:322 / 327
页数:6
相关论文
共 50 条
  • [1] Recognizing logical parts in Vietnamese Legal Texts using Conditional Random Fields
    Nguyen Truong Son
    Ho Bao Quoc
    Nguyen Thi Phuong Duyen
    Nguyen Le Minh
    [J]. 2015 IEEE RIVF INTERNATIONAL CONFERENCE ON COMPUTING & COMMUNICATION TECHNOLOGIES - RESEARCH, INNOVATION, AND VISION FOR THE FUTURE (RIVF), 2015, : 1 - 6
  • [2] Dynamic Conditional Random Fields for Joint Sentence Boundary and Punctuation Prediction
    Wang, Xuancong
    Ng, Hwee Tou
    Sim, Khe Chai
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1382 - 1385
  • [3] Chunking Arabic Texts Using Conditional Random Fields
    Khoufi, Nabil
    Aloulou, Chafik
    Hadrich Belguith, Lamia
    [J]. 2014 IEEE/ACS 11TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2014, : 428 - 432
  • [4] Chunking using conditional random fields in Korean texts
    Lee, YH
    Kim, MY
    Lee, JH
    [J]. NATURAL LANGUAGE PROCESSING - IJCNLP 2005, PROCEEDINGS, 2005, 3651 : 155 - 164
  • [5] Relation Extraction in Vietnamese Text Using Conditional Random Fields
    Rathany Chan Sam
    Huong Thanh Le
    Thuy Thanh Nguyen
    The Minh Trinh
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, 2010, 6458 : 330 - 339
  • [6] Conrad: Gene prediction using conditional random fields
    DeCaprio, David
    Vinson, Jade P.
    Pearson, Matthew D.
    Montgomery, Philip
    Doherty, Matthew
    Galagan, James E.
    [J]. GENOME RESEARCH, 2007, 17 (09) : 1389 - 1398
  • [7] Vietnamese Punctuation Prediction Using Deep Neural Networks
    Thuy Pham
    Nhu Nguyen
    Pham, Quang
    Cao, Han
    Binh Nguyen
    [J]. SOFSEM 2020: THEORY AND PRACTICE OF COMPUTER SCIENCE, 2020, 12011 : 388 - 400
  • [8] Extracting Terms from Texts with Conditional Random Fields
    Li YiXuan
    Lu Xun
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON EDUCATION, MANAGEMENT, COMPUTER AND SOCIETY, 2016, 37 : 293 - 296
  • [9] Vietnamese Noun Phrase Chunking based on Conditional Random Fields
    Nguyen Thi Huong Thao
    Nguyen Phuong Thai
    Nguyen Le Minh
    Ha Quang Thuy
    [J]. INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2009), 2009, : 172 - +
  • [10] Visual Webpage Block Importance Prediction Using Conditional Random Fields
    Tsai, Richard Tzong-Han
    Chiu, Borong
    Wu, Chi-En
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2011, 62 (11): : 2225 - 2235