Automatic Entity Recognition and Typing in Massive Text Data

被引:9
|
作者
Ren, Xiang [1 ]
El-Kishky, Ahmed [1 ]
Ji, Heng [2 ]
Han, Jiawei [1 ]
机构
[1] Univ Illinois, Urbana, IL 61801 USA
[2] Rensselaer Polytech Inst, Dept Comp Sci, Troy, NY 12180 USA
基金
美国国家科学基金会;
关键词
WEB;
D O I
10.1145/2882903.2912567
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In today's computerized and information-based society, individuals are constantly presented with vast amounts of text data, ranging from news articles, scientific publications, product reviews, to a wide range of textual information from social media. To extract value from these large, multi-domain pools of text, it is of great importance to gain an understanding of entities and their relationships. In this tutorial, we introduce data-driven methods to recognize typed entities of interest in massive, domain-specific text corpora. These methods can automatically identify token spans as entity mentions in documents and label their fine-grained types (e.g., people, product and food) in a scalable way. Since these methods do not rely on annotated data, pre-defined typing schema or hand-crafted features, they can be quickly adapted to a new domain, genre and language. We demonstrate on real datasets including various genres (e.g., news articles, discussion forum posts, and tweets), domains (general vs. biomedical domains) and languages (e.g., English, Chinese, Arabic, and even low-resource languages like Hausa and Yoruba) how these typed entities aid in knowledge discovery and management.
引用
收藏
页码:2235 / 2239
页数:5
相关论文
共 50 条
  • [41] Automatic Genre Recognition and Adaptive Text Summarization
    Yatsko, V. A.
    Starikov, M. S.
    Butakov, A. V.
    AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS, 2010, 44 (03) : 111 - 120
  • [42] Robust Scene Text Recognition with Automatic Rectification
    Shi, Baoguang
    Wang, Xinggang
    Lyu, Pengyuan
    Yao, Cong
    Bai, Xiang
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4168 - 4176
  • [43] Adversarial training for named entity recognition of rail fault text
    Qu, J.
    Su, S.
    Li, R.
    Wang, G.
    2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2021, : 1353 - 1358
  • [44] Transfer learning for Turkish named entity recognition on noisy text
    Kagan Akkaya, Emre
    Can, Burcu
    NATURAL LANGUAGE ENGINEERING, 2021, 27 (01) : 35 - 64
  • [45] Named Entity Recognition of Chinese Text Based on Attention Mechanism
    Shen, Tong-Ping
    Dumlao, Menchita
    Meng, Qing-Quan
    Zhan, Zhong-Hua
    Journal of Network Intelligence, 2023, 8 (02): : 505 - 518
  • [46] Research on Name Entity Recognition Method in Civil Aviation Text
    Xing, Zhiwei
    Dai, Zheng
    Luo, Qian
    Liu, Yang
    Chen, Zhaoxin
    Wen, Tao
    PROCEEDINGS OF 2020 IEEE 2ND INTERNATIONAL CONFERENCE ON CIVIL AVIATION SAFETY AND INFORMATION TECHNOLOGY (ICCASIT), 2020, : 23 - 29
  • [47] HDCNN-CRF for Biomedical Text Named Entity Recognition
    Gao, Mingyuan
    Wei, Hao
    Chen, Fei
    Qu, Wen
    Lu, Mingyu
    PROCEEDINGS OF 2019 IEEE 10TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2019), 2019, : 191 - 194
  • [48] Chinese Named Entity Recognition for Hazard And Operability Analysis Text
    Li, FangGuo
    Zhang, BeiKe
    Gao, Dong
    PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 374 - 378
  • [49] A comprehensive study of named entity recognition in Chinese clinical text
    Lei, Jianbo
    Tang, Buzhou
    Lu, Xueqin
    Gao, Kaihua
    Jiang, Min
    Xu, Hua
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2014, 21 (05) : 808 - 814
  • [50] Learning to Denoise Distantly-Labeled Data for Entity Typing
    Onoe, Yasumasa
    Durrett, Greg
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2407 - 2417