A Chinese word segmentation model for energy literature based on Conditional Random Fields

被引:0
|
作者
Zhao, Liujun [1 ]
Kong, Weizheng [1 ]
Chai, Bo [2 ,3 ]
机构
[1] State Grid Corp China, State Grid Energy Res Inst CO Ltd, Beijing 102209, Peoples R China
[2] Global Energy Interconnect Res Inst, Beijing 102211, Peoples R China
[3] Artificial Intelligence Elect Power Syst State Gr, Beijing 102211, Peoples R China
关键词
Chinese word segmentation; State Grid Energy Literature; conditional random fields; conditional entropy;
D O I
暂无
中图分类号
TE [石油、天然气工业]; TK [能源与动力工程];
学科分类号
0807 ; 0820 ;
摘要
Chinese word segmentation is one of the foundation and core tasks for Chinese natural language processing. Although some achievements have been made for Chinese word segmentation system in general domains, it is far away to meet practical requirements in energy domain. We focus on Chinese word segmentation standard and segmentation technology in the energy domain which consists of 13283 energy basic terms. This paper firstly proposes a conditional random field segmentation model. Then, the character features, character type features and conditional entropy features which influence the word segmentation performance are chose and described. Finally, the proposed model is tested on the dataset of the State Grid energy literature and compared with current word segmentation tools, such as the Harbin Institute of Technology's Language Technology Platform and the Tsinghua's THU Lexical Analyzer for Chinese language processing tools. The F1 value of the best result of the proposed model is 0.8319.
引用
下载
收藏
页码:785 / 788
页数:4
相关论文
共 50 条
  • [1] Chinese Word Segmentation based on Conditional Random Fields with Character Clustering
    Du, Liping
    Li, Xiaoge
    Liu, Chunli
    Liu, Rui
    Fan, Xian
    Yang, Jianing
    Lin, Dayi
    Wei, Mian
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2016, : 258 - 261
  • [2] A Conditional Random Fields Model for Overlapping Ambiguity Resolution in Chinese Word Segmentation
    Liang, Yan
    Zhu, Yaoting
    2009 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING ( GRC 2009), 2009, : 384 - +
  • [3] Domain dependent word segmentation based on conditional random fields
    Fukuda, Takuya
    Izumi, Masataka
    Miura, Takao
    2007 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING, VOLS 1 AND 2, 2007, : 264 - 267
  • [4] Word segmentation using domain knowledge based on conditional random fields
    Fukuda, Takuya
    Izzumi, Masataka
    Miura, Takao
    19TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL II, PROCEEDINGS, 2007, : 436 - 439
  • [5] Scaling conditional random field with application to Chinese word segmentation
    Zhao, Hai
    Kit, Chunyu
    ICNC 2007: THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 5, PROCEEDINGS, 2007, : 95 - +
  • [6] Object Segmentation Based on Gaussian Mixture Model and Conditional Random Fields
    Qi, Yali
    Zhang, Guoshan
    Qi, Yali
    Li, Yeli
    2016 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION (ICIA), 2016, : 900 - 904
  • [7] Exploiting Unlabeled Internal Data in Conditional Random Fields to Reduce Word Segmentation Errors for Chinese Texts
    Tsai, Richard Tzong-Han
    Hung, Hsi-Chuan
    Dai, Hong-Jie
    Hsu, Wen-Lian
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2944 - 2947
  • [8] Recognition of Internet word based on conditional random fields
    Hu, Y. (huyong@scu.edu.cn), 1600, Binary Information Press, Flat F 8th Floor, Block 3, Tanner Garden, 18 Tanner Road, Hong Kong (11):
  • [9] Sense Group Segmentation for Chinese Second Language Reading Based on Conditional Random Fields
    Zhu, Shuqin
    Song, Jihua
    Peng, Weiming
    Sun, Jingbo
    CHINESE LEXICAL SEMANTICS, CLSW 2018, 2018, 11173 : 559 - 569
  • [10] Chinese Unknown Word Recognition using improved Conditional Random Fields
    Xu, Yisu
    Wang, Xuan
    Tang, Buzhou
    Wang, Xiaolong
    ISDA 2008: EIGHTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, VOL 2, PROCEEDINGS, 2008, : 363 - 367