Hadoop Recognition of Biomedical Named Entity Using Conditional Random Fields

被引:55
|
作者
Li, Kenli [1 ,3 ]
Ai, Wei [1 ]
Tang, Zhuo [1 ]
Zhang, Fan [2 ]
Jiang, Lingang [1 ]
Li, Keqin [4 ]
Hwang, Kai [5 ]
机构
[1] Hunan Univ, Coll Informat Sci & Engn, Changsha 410082, Hunan, Peoples R China
[2] MIT, Kavli Inst Astrophys & Space Res, Cambridge, MA 02139 USA
[3] SUNY Coll New Paltz, Dept Comp Sci, New Paltz, NY 12561 USA
[4] Hunan Univ, Coll Informat Sci & Engn, Changsha 410082, Hunan, Peoples R China
[5] Univ So Calif, Dept Elect Engn, Los Angeles, CA 90089 USA
基金
中国国家自然科学基金;
关键词
Biomedical named entity recognition; conditional random fields; MapReduce; parallel algorithm; FEATURES;
D O I
10.1109/TPDS.2014.2368568
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Processing large volumes of data has presented a challenging issue, particularly in data-redundant systems. As one of the most recognized models, the conditional random fields (CRF) model has been widely applied in biomedical named entity recognition (Bio-NER). Due to the internally sequential feature, performance improvement of the CRF model is nontrivial, which requires new parallelized solutions. By combining and parallelizing the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) and Viterbi algorithms, we propose a parallel CRF algorithm called MapReduce CRF (MRCRF) in this paper, which contains two parallel sub-algorithms to handle two time-consuming steps of the CRF model. The MapReduce L-BFGS (MRLB) algorithm leverages the MapReduce framework to enhance the capability of estimating parameters. Furthermore, the MapReduce Viterbi (MRVtb) algorithm infers the most likely state sequence by extending the Viterbi algorithm with another MapReduce job. Experimental results show that the MRCRF algorithm outperforms other competing methods by exhibiting significant performance improvement in terms of time efficiency as well as preserving a guaranteed level of correctness.
引用
收藏
页码:3040 / 3051
页数:12
相关论文
共 50 条
  • [31] Lao Named Entity Recognition based on Conditional Random Fields with Simple Heuristic Information
    Yang, Mengjie
    Zhou, Lanjiang
    Yu, Zhengtao
    Gao, Shengxiang
    Guo, Jianyi
    [J]. 2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2015, : 1426 - 1431
  • [32] Named Entity Recognition with Conditional Random Fields on Turkish News Dataset: Revisiting the Features
    Cekinel, Recep Firat
    Agriman, Mustafa
    Karagoz, Pinar
    Yilmaz, Burcu
    [J]. 2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [33] Incorporating dictionary features into conditional random fields for gene/protein named entity recognition
    Lin, Hongfei
    Li, Yanpeng
    Yang, Zhihao
    [J]. EMERGING TECHNOLOGIES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2007, 4819 : 162 - 173
  • [34] Extending Hybrid Conditional Random Fields Approach of Named Entity Recognition for Marathi Tweets
    Patawar, Maithilee L.
    Potey, M. A.
    [J]. 2016 INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2016,
  • [35] Medical Entity Recognition in Twitter using Conditional Random Fields
    Komariah, Kokoy Siti
    Shin, Bong-Kee
    [J]. 2021 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2021,
  • [36] Early results for chinese named entity recognition using conditional random fields model, HMM and maximum entropy
    Feng, YY
    Sun, L
    Zhang, JL
    [J]. PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), 2005, : 549 - 552
  • [37] Cybersecurity Named Entity Recognition Using Bidirectional Long Short-Term Memory with Conditional Random Fields
    Ma, Pingchuan
    Jiang, Bo
    Lu, Zhigang
    Li, Ning
    Jiang, Zhengwei
    [J]. TSINGHUA SCIENCE AND TECHNOLOGY, 2021, 26 (03) : 259 - 265
  • [38] A combination of active learning and self-learning for named entity recognition on Twitter using conditional random fields
    Van Cuong Tran
    Ngoc Thanh Nguyen
    Fujita, Hamido
    Dinh Tuyen Hoang
    Hwang, Dosam
    [J]. KNOWLEDGE-BASED SYSTEMS, 2017, 132 : 179 - 187
  • [39] Named Entity Recognition in Text Documents Using a Modified Conditional Random Field
    Veena, G.
    Gupta, Deepa
    Lakshmi, S.
    Jacob, Jeenu T.
    [J]. RECENT FINDINGS IN INTELLIGENT COMPUTING TECHNIQUES, VOL 3, 2018, 709 : 31 - 41
  • [40] Cybersecurity Named Entity Recognition Using Bidirectional Long Short-Term Memory with Conditional Random Fields
    PingchuanMa
    BoJiang
    ZhigangLu
    NingLi
    ZhengweiJiang
    [J]. Tsinghua Science and Technology, 2021, 26 (03) : 259 - 265