Automatic Entity Recognition and Typing in Massive Text Data

被引:9
|
作者
Ren, Xiang [1 ]
El-Kishky, Ahmed [1 ]
Ji, Heng [2 ]
Han, Jiawei [1 ]
机构
[1] Univ Illinois, Urbana, IL 61801 USA
[2] Rensselaer Polytech Inst, Dept Comp Sci, Troy, NY 12180 USA
基金
美国国家科学基金会;
关键词
WEB;
D O I
10.1145/2882903.2912567
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In today's computerized and information-based society, individuals are constantly presented with vast amounts of text data, ranging from news articles, scientific publications, product reviews, to a wide range of textual information from social media. To extract value from these large, multi-domain pools of text, it is of great importance to gain an understanding of entities and their relationships. In this tutorial, we introduce data-driven methods to recognize typed entities of interest in massive, domain-specific text corpora. These methods can automatically identify token spans as entity mentions in documents and label their fine-grained types (e.g., people, product and food) in a scalable way. Since these methods do not rely on annotated data, pre-defined typing schema or hand-crafted features, they can be quickly adapted to a new domain, genre and language. We demonstrate on real datasets including various genres (e.g., news articles, discussion forum posts, and tweets), domains (general vs. biomedical domains) and languages (e.g., English, Chinese, Arabic, and even low-resource languages like Hausa and Yoruba) how these typed entities aid in knowledge discovery and management.
引用
收藏
页码:2235 / 2239
页数:5
相关论文
共 50 条
  • [21] Automatic text segmentation and text recognition for video indexing
    Rainer Lienhart
    Wolfgang Effelsberg
    Multimedia Systems, 2000, 8 : 69 - 81
  • [22] Automatic Data Augmentation from Massive Web Images for Deep Visual Recognition
    Bai, Yalong
    Yang, Kuiyuan
    Mei, Tao
    Ma, Wei-Ying
    Zhao, Tiejun
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2018, 14 (03)
  • [23] Automatic video text localization and recognition
    Guo, Ge
    Jin, Jin
    Ping, Xijian
    Zhang, Tao
    PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON IMAGE AND GRAPHICS, 2007, : 484 - +
  • [24] Automatic text recognition in digital videos
    Lienhart, R
    Stuber, F
    IMAGE AND VIDEO PROCESSING IV, 1996, 2666 : 180 - 188
  • [25] Automatic video text localization and recognition
    Saracoglu, Ahmet
    Alatan, A. Aydin
    2006 IEEE 14TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1 AND 2, 2006, : 964 - 967
  • [26] A Hybrid Named Entity Recognition System for Aviation Text
    Bharathi, A.
    Ramdin, Robin
    Babu, Preeja
    Menon, Vijay Krishna
    Jayaramakrishnan, Chandrasekhar
    Lakshmikumar, Sudarsan
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2024, 11 (01)
  • [27] Nested named entity recognition in historical archive text
    Byrne, Kate
    ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 589 - 596
  • [28] Named Entity Recognition Method for Process Planning Text
    Dong H.
    Li Y.
    Qiao L.
    Huang Z.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2024, 36 (02): : 313 - 320
  • [29] Named Entity Recognition in Unstructured Medical Text Documents
    Pearson, Cole
    Seliya, Naeem
    Dave, Rushit
    INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND ENERGY TECHNOLOGIES (ICECET 2021), 2021, : 412 - 417
  • [30] Named Entity Recognition for Russian Judicial Rulings Text
    Averina, Maria
    Levanova, Olga
    Kasatkina, Natalia
    2022 32ND CONFERENCE OF OPEN INNOVATIONS ASSOCIATION (FRUCT), 2022, : 49 - 55