Conditional Random Fields for Word Hyphenation

被引:0
|
作者
Trogkanis, Nikolaos [1 ]
Elkan, Charles [1 ]
机构
[1] Univ Calif San Diego, Comp Sci & Engn, La Jolla, CA 92093 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Finding allowable places in words to insert hyphens is an important practical problem. The algorithm that is used most often nowadays has remained essentially unchanged for 25 years. This method is the TEX hyphenation algorithm of Knuth and Liang. We present here a hyphenation method that is clearly more accurate. The new method is an application of conditional random fields. We create new training sets for English and Dutch from the CELEX European lexical resource, and achieve error rates for English of less than 0.1% for correctly allowed hyphens, and less than 0.01% for Dutch. Experiments show that both the Knuth/ Liang method and a leading current commercial alternative have error rates several times higher for both languages.
引用
收藏
页码:366 / 374
页数:9
相关论文
共 50 条
  • [1] Discriminative Word Alignment with Conditional Random Fields
    Blunsom, Phil
    Cohn, Trevor
    [J]. COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 65 - 72
  • [2] CRANDEM: Conditional Random Fields for Word Recognition
    Morris, Jeremy
    Fosler-Lussier, Eric
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 3035 - 3038
  • [3] Handwritten word recognition using conditional random fields
    Shetty, Shravya
    Srinivasan, Harish
    Srihari, Sargur
    [J]. ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 1098 - 1102
  • [4] Recognition of Internet word based on conditional random fields
    [J]. Hu, Y. (huyong@scu.edu.cn), 1600, Binary Information Press, Flat F 8th Floor, Block 3, Tanner Garden, 18 Tanner Road, Hong Kong (11):
  • [5] Domain dependent word segmentation based on conditional random fields
    Fukuda, Takuya
    Izumi, Masataka
    Miura, Takao
    [J]. 2007 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING, VOLS 1 AND 2, 2007, : 264 - 267
  • [6] Word Boundary Identification for Myanmar Text Using Conditional Random Fields
    Pa, Win Pa
    Thu, Ye Kyaw
    Finch, Andrew
    Sumita, Eiichiro
    [J]. GENETIC AND EVOLUTIONARY COMPUTING, VOL II, 2016, 388 : 447 - 456
  • [7] Chinese Unknown Word Recognition using improved Conditional Random Fields
    Xu, Yisu
    Wang, Xuan
    Tang, Buzhou
    Wang, Xiaolong
    [J]. ISDA 2008: EIGHTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, VOL 2, PROCEEDINGS, 2008, : 363 - 367
  • [8] Chinese Word Segmentation based on Conditional Random Fields with Character Clustering
    Du, Liping
    Li, Xiaoge
    Liu, Chunli
    Liu, Rui
    Fan, Xian
    Yang, Jianing
    Lin, Dayi
    Wei, Mian
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2016, : 258 - 261
  • [9] Semantic Parsing Using Word Confusion Networks With Conditional Random Fields
    Tur, Gokhan
    Deoras, Anoop
    Hakkani-Tuer, Dilek
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2578 - 2582
  • [10] Word segmentation using domain knowledge based on conditional random fields
    Fukuda, Takuya
    Izzumi, Masataka
    Miura, Takao
    [J]. 19TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL II, PROCEEDINGS, 2007, : 436 - 439