Automatic Entity Recognition and Typing in Massive Text Data

被引:9
|
作者
Ren, Xiang [1 ]
El-Kishky, Ahmed [1 ]
Ji, Heng [2 ]
Han, Jiawei [1 ]
机构
[1] Univ Illinois, Urbana, IL 61801 USA
[2] Rensselaer Polytech Inst, Dept Comp Sci, Troy, NY 12180 USA
基金
美国国家科学基金会;
关键词
WEB;
D O I
10.1145/2882903.2912567
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In today's computerized and information-based society, individuals are constantly presented with vast amounts of text data, ranging from news articles, scientific publications, product reviews, to a wide range of textual information from social media. To extract value from these large, multi-domain pools of text, it is of great importance to gain an understanding of entities and their relationships. In this tutorial, we introduce data-driven methods to recognize typed entities of interest in massive, domain-specific text corpora. These methods can automatically identify token spans as entity mentions in documents and label their fine-grained types (e.g., people, product and food) in a scalable way. Since these methods do not rely on annotated data, pre-defined typing schema or hand-crafted features, they can be quickly adapted to a new domain, genre and language. We demonstrate on real datasets including various genres (e.g., news articles, discussion forum posts, and tweets), domains (general vs. biomedical domains) and languages (e.g., English, Chinese, Arabic, and even low-resource languages like Hausa and Yoruba) how these typed entities aid in knowledge discovery and management.
引用
收藏
页码:2235 / 2239
页数:5
相关论文
共 50 条
  • [31] Named Entity Recognition in Twitter Using Images and Text
    Esteves, Diego
    Peres, Rafael
    Lehmann, Jens
    Napolitano, Giulio
    CURRENT TRENDS IN WEB ENGINEERING, ICWE 2017, 2018, 10544 : 191 - 199
  • [32] Nested Entity Recognition Approach in Chinese Medical Text
    Yan J.-H.
    Zong C.-Q.
    Xu J.-A.
    Ruan Jian Xue Bao/Journal of Software, 2024, 35 (06): : 2923 - 2935
  • [33] WORLD OF WORD PROCESSING - AUTOMATIC TYPING AND TEXT EDITING DEVICES
    DAVIDOWITZ, E
    AMERICAN BAR ASSOCIATION JOURNAL, 1976, 62 (JAN): : 98 - 107
  • [34] Ontology-driven automatic entity disambiguation in unstructured text
    Hassell, Joseph
    Aleman-Meza, Boanerges
    Arpinar, I. Budak
    SEMANTIC WEB - ISEC 2006, PROCEEDINGS, 2006, 4273 : 44 - +
  • [35] Data-to-text Generation with Entity Modeling
    Puduppully, Ratish
    Dong, Li
    Lapata, Mirella
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2023 - 2035
  • [36] Multidimensional Mining of Massive Text Data
    Zhang, Chao
    Han, Jiawei
    Synthesis Lectures on Data Mining and Knowledge Discovery, 2019, 11 (02): : 1 - 198
  • [37] Knowledge Base Entity Typing From Text via Entity-Aware Heterogeneous Graph Attention Network
    Xu, Bo
    Sun, Zhong
    Du, Ming
    Song, Hui
    Wang, Hongya
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [38] One Class per Named Entity: Exploiting Unlabeled Text for Named Entity Recognition
    Wong, Yingchuan
    Ng, Hwee Tou
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1763 - 1768
  • [39] TEXT NORMALIZATION FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
    Vasile, Alin-Florentin
    Boros, Tiberiu
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE 'LINQUISTIC RESOURCES AND TOOLS FOR PROCESSING THE ROMANIAN LANGUAGE', 2016, : 121 - 128
  • [40] Automatic Text Recognition Using Difference Ratio
    Anwar, Shamama
    SMART COMPUTING AND INFORMATICS, 2018, 77 : 691 - 699