Named Entity Recognition Using Conditional Random Fields

被引:7
|
作者
Khan, Wahab [1 ,2 ]
Daud, Ali [3 ]
Shahzad, Khurram [4 ]
Amjad, Tehmina [2 ]
Banjar, Ameen [3 ]
Fasihuddin, Heba [3 ]
机构
[1] Univ Sci & Technol, Dept Comp Sci, Bannu 28100, Pakistan
[2] Int Islamic Univ Islamabad, Dept Comp Sci & Software Engn, Islamabad 44000, Pakistan
[3] Univ Jeddah, Coll Comp Sci & Engn, Dept Informat Syst & Technol, Jeddah 21959, Saudi Arabia
[4] Univ Punjab, Dept Data Sci, Lahore 54000, Pakistan
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 13期
关键词
natural language processing; information filtering; information extraction; machine learning; classification algorithms; named entity recognition; NETWORKS;
D O I
10.3390/app12136391
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Named entity recognition (NER) is an important task in natural language processing, as it is widely featured as a key information extraction sub-task with numerous application areas. A plethora of attempts was made for NER detection in Western and Asian languages. However, little effort has been made to develop techniques for the Urdu language, which is a prominent South Asian language with hundreds of millions of speakers across the globe. NER in Urdu is considered a hard problem owing to several reasons, including the paucity of large, annotated datasets; an inaccurate tokenizer; and the absence of capitalization in the Urdu language. To this end, this study proposed a conditional-random-field-based technique with both language-dependent and language-independent features, such as part-of-speech tags and context windows of words, respectively. As a second contribution, we developed an Urdu NER dataset (UNER-I) in which a large number of NE types were manually annotated. To evaluate the effectiveness of the proposed approach, as well as the usefulness of the dataset, experiments were performed using the dataset we developed and an existing dataset. The results of the experiments showed that our proposed technique outperformed the baseline technique for both datasets by improving the F1 scores by 1.5% to 3%. Furthermore, the results demonstrated that the enhanced dataset was useful for learning and prediction in a supervised learning approach.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] Advanced Feature-Driven Disease Named Entity Recognition Using Conditional Random Fields
    Rahman, Hidayat
    Hahn, Thomas
    Segall, Richard
    [J]. PROCEEDINGS OF THE 7TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2016, : 469 - 469
  • [22] FINANCIAL NAMED ENTITY RECOGNITION BASED ON CONDITIONAL RANDOM FIELDS AND INFORMATION ENTROPY
    Wang, Shuwei
    Xu, Ruifeng
    Liu, Bin
    Gui, Lin
    Zhou, Yu
    [J]. PROCEEDINGS OF 2014 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL 2, 2014, : 838 - 843
  • [23] Named Entity Recognition for Setswana Language: A Conditional Random Fields (CRF) Approach
    Okgetheng, Boago
    Malema, Gabofetswe
    [J]. PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2023, 2023, : 240 - 244
  • [24] Based Cascaded Conditional Random Fields Model for Chinese Named Entity Recognition
    Zhang Suxiang
    [J]. ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 1574 - 1578
  • [25] Exploring unsupervised features in Conditional Random Fields for Spanish Named Entity Recognition
    Copara, Jenny
    Ochoa, Jose
    Thorne, Camilo
    Glavas, Goran
    [J]. PROCEEDINGS OF 2016 5TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2016), 2016, : 283 - 288
  • [26] Conditional random fields for clinical named entity recognition: A comparative study using Korean clinical texts
    Lee, Wangjin
    Kim, Kyungmo
    Lee, Eun Young
    Choi, Jinwook
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2018, 101 : 7 - 14
  • [27] Improving the Scalability of Semi-Markov Conditional Random Fields for Named Entity Recognition
    Okanohara, Daisuke
    Miyao, Yusuke
    Tsuruoka, Yoshimasa
    Tsujii, Jun'ichi
    [J]. COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 465 - 472
  • [28] Lao Named Entity Recognition based on Conditional Random Fields with Simple Heuristic Information
    Yang, Mengjie
    Zhou, Lanjiang
    Yu, Zhengtao
    Gao, Shengxiang
    Guo, Jianyi
    [J]. 2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2015, : 1426 - 1431
  • [29] Named Entity Recognition with Conditional Random Fields on Turkish News Dataset: Revisiting the Features
    Cekinel, Recep Firat
    Agriman, Mustafa
    Karagoz, Pinar
    Yilmaz, Burcu
    [J]. 2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [30] Incorporating dictionary features into conditional random fields for gene/protein named entity recognition
    Lin, Hongfei
    Li, Yanpeng
    Yang, Zhihao
    [J]. EMERGING TECHNOLOGIES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2007, 4819 : 162 - 173