Feature selection techniques for maximum entropy based biomedical named entity recognition

被引:73
|
作者
Saha, Sujan Kumar [1 ]
Sarkar, Sudeshna [1 ]
Mitra, Pabitra [1 ]
机构
[1] Indian Inst Technol, Dept Comp Sci & Engn, Kharagpur 721302, W Bengal, India
关键词
Biomedical named entity recognition; Feature selection; Feature reduction; Maximum entropy classifier; Machine learning; KNOWLEDGE;
D O I
10.1016/j.jbi.2008.12.012
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Named entity recognition is an extremely important and fundamental task of biomedical text mining. Biomedical named entities include mentions of proteins, genes. DNA, RNA, etc which often have complex structures, but it is challenging to identify and classify such entities. Machine learning methods like CRF, MEMM and SVM have been widely used for learning to recognize such entities from an annotated corpus. The identification of appropriate feature templates and the selection of the important feature values play a very important role in the success of these methods. In this paper, we provide a study on word clustering and selection based feature reduction approaches for named entity recognition using a maximum entropy classifier. The identification and selection of features are largely done automatically without using domain knowledge. The performance of the system is found to be superior to existing systems which do not use domain knowledge. (C) 2009 Elsevier Inc. All rights reserved.
引用
收藏
页码:905 / 911
页数:7
相关论文
共 50 条
  • [1] Multiobjective Approach for Feature Selection in Maximum Entropy based Named Entity Recognition
    Ekbal, Asif
    Saha, Sriparna
    Hasanuzzaman, Md
    [J]. 22ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2010), PROCEEDINGS, VOL 1, 2010,
  • [2] Improving feature extraction in named entity recognition based on maximum entropy model
    Jiang, Wei
    Guan, Yi
    Wang, Xiao-Long
    [J]. PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 2630 - +
  • [3] A probabilistic feature based Maximum Entropy model for Chinese named entity recognition
    Zhang, Suxiang
    Wang, Xiaojie
    Wen, Juan
    Qin, Ying
    Zhong, Yixin
    [J]. COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 189 - +
  • [4] Feature Importance for Biomedical Named Entity Recognition
    Huggard, Hamish
    Zhang, Aaron
    Zhang, Edmond
    Koh, Yun Sing
    [J]. AI 2019: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, 11919 : 406 - 417
  • [5] ANERsys: An Arabic Named Entity Recognition system based on maximum entropy
    Benajiba, Yassine
    Rosso, Paolo
    Ruiz, Jose Miguel Benedi
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2007, 4394 : 143 - +
  • [6] Method of Chinese Named Entity Recognition Based on Maximum Entropy Model
    Ning Hui
    Yang Hua
    Tan Ya-zhou
    Wu Hao
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, VOLS 1-7, CONFERENCE PROCEEDINGS, 2009, : 2472 - 2477
  • [7] Maximum Entropy Named Entity Recognition for Czech Language
    Konkol, Michal
    Konopik, Miloslav
    [J]. TEXT, SPEECH AND DIALOGUE, TSD 2011, 2011, 6836 : 203 - 210
  • [8] Hungarian named entity recognition with a maximum entropy approach
    Varga, Daniel
    Simon, Eszter
    [J]. ACTA CYBERNETICA, 2007, 18 (02): : 293 - 301
  • [9] Classifier subset selection for biomedical named entity recognition
    Dimililer, Nazife
    Varoglu, Ekrem
    Altincay, Hakan
    [J]. APPLIED INTELLIGENCE, 2009, 31 (03) : 267 - 282
  • [10] Classifier subset selection for biomedical named entity recognition
    Nazife Dimililer
    Ekrem Varoğlu
    Hakan Altınçay
    [J]. Applied Intelligence, 2009, 31 : 267 - 282