Probabilistic Named Entity Recognition for non-standard format entities using co-occurrence word embeddings

被引:0
|
作者
Al-Ani, Jabir Alshehabi [1 ]
Fasli, Maria [1 ,2 ]
机构
[1] Univ Essex, Sch Elect Engn & Comp Sci, Colchester CO4 3SQ, Essex, England
[2] Univ Essex, Sch Elect Engn & Comp Sci, Inst Analyt & Data Sci, Colchester CO4 3SQ, Essex, England
基金
英国经济与社会研究理事会;
关键词
Named Entity Recognition; Word Embeddings; Co-occurrence patterns; Twitter; Information Extraction; LINKING;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The use of short text has become widespread in social media like Twitter and Facebook. Typically, users on social media platforms adopt non-standard format terms when posting. This introduces challenges for Information Retrieval (IR) and Natural Language Processing (NLP) and standard or classical methods tend not to perform well in this domain. In this paper, we have addressed one of the challenges in IR which is Named Entity Recognition (NER). We introduce a novel probabilistic approach which targets entities occurring in an informal (non-standard) format within short text. The Probabilistic Named Entity Recognition (PNER) model identifies these entities using co-occurrence patterns. These patterns have been detected using the word co-occurrence embeddings of 278.6 million tweets. The results show an enhancement of 7% on two standard methods when used in combination with PNER. The testing dataset has been created using the standard methods in addition to street names and places taken from the Open Street Map (OSM) database.
引用
收藏
页码:2077 / 2086
页数:10
相关论文
共 13 条
  • [1] Shahmukhi named entity recognition by using contextualized word embeddings
    Tehseen, Amina
    Ehsan, Toqeer
    Bin Liaqat, Hannan
    Kong, Xiangjie
    Ali, Amjad
    Al-Fuqaha, Ala
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 229
  • [2] Co-occurrence based word representation for extracting named entities in Tamil tweets
    Devi, G. Remmiya
    Kumar, M. Anand
    Soman, K. P.
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (03) : 1435 - 1442
  • [3] A Hybrid Semantic Relatedness Algorithm by Entity Co-Occurrence and Specialized Word Embeddings
    Heo, Go Eun
    Xie, Qing
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2019, : 478 - 479
  • [4] Geographic Named Entity Recognition and Disambiguation in Mexican News using word embeddings
    Molina-Villegas, Alejandro
    Muniz-Sanchez, Victor
    Arreola-Trapala, Jean
    Alcantara, Filomeno
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 176
  • [5] Improving Named Entity Recognition for Morphologically Rich Languages using Word Embeddings
    Demir, Hakan
    Ozgur, Arzucan
    [J]. 2014 13TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2014, : 117 - 122
  • [6] Improving Named Entity Recognition in Tweets via Detecting Non-Standard Words
    Li, Chen
    Liu, Yang
    [J]. PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, 2015, : 929 - 938
  • [7] Combining entity co-occurrence with specialized word embeddings to measure entity relation in Alzheimer’s disease
    Go Eun Heo
    Qing Xie
    Min Song
    Jeong-Hoon Lee
    [J]. BMC Medical Informatics and Decision Making, 19
  • [8] Combining entity co-occurrence with specialized word embeddings to measure entity relation in Alzheimer's disease
    Heo, Go Eun
    Xie, Qing
    Song, Min
    Lee, Jeong-Hoon
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2019, 19 (01)
  • [9] Named Entity Recognition on Twitter for Turkish using Semi-supervised Learning with Word Embeddings
    Okur, Eda
    Demir, Hakan
    Ozgur, Arzucan
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 549 - 555
  • [10] Semi-supervised Approach Based on Co-occurrence Coefficient for Named Entity Recognition on Twitter
    Van Cuong Tran
    Hwang, Dosam
    Jung, Jason J.
    [J]. PROCEEDINGS OF 2015 2ND NATIONAL FOUNDATION FOR SCIENCE AND TECHNOLOGY DEVELOPMENT CONFERENCE ON INFORMATION AND COMPUTER SCIENCE NICS 2015, 2015, : 141 - 146