Automatic Entity Recognition and Typing in Massive Text Data

被引:9
|
作者
Ren, Xiang [1 ]
El-Kishky, Ahmed [1 ]
Ji, Heng [2 ]
Han, Jiawei [1 ]
机构
[1] Univ Illinois, Urbana, IL 61801 USA
[2] Rensselaer Polytech Inst, Dept Comp Sci, Troy, NY 12180 USA
基金
美国国家科学基金会;
关键词
WEB;
D O I
10.1145/2882903.2912567
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In today's computerized and information-based society, individuals are constantly presented with vast amounts of text data, ranging from news articles, scientific publications, product reviews, to a wide range of textual information from social media. To extract value from these large, multi-domain pools of text, it is of great importance to gain an understanding of entities and their relationships. In this tutorial, we introduce data-driven methods to recognize typed entities of interest in massive, domain-specific text corpora. These methods can automatically identify token spans as entity mentions in documents and label their fine-grained types (e.g., people, product and food) in a scalable way. Since these methods do not rely on annotated data, pre-defined typing schema or hand-crafted features, they can be quickly adapted to a new domain, genre and language. We demonstrate on real datasets including various genres (e.g., news articles, discussion forum posts, and tweets), domains (general vs. biomedical domains) and languages (e.g., English, Chinese, Arabic, and even low-resource languages like Hausa and Yoruba) how these typed entities aid in knowledge discovery and management.
引用
收藏
页码:2235 / 2239
页数:5
相关论文
共 50 条
  • [1] Automatic Entity Recognition and Typing in Massive Text Corpora
    Ren, Xiang
    El-Kishky, Ahmed
    Wang, Chi
    Han, Jiawei
    PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16 COMPANION), 2016, : 1025 - 1028
  • [2] Automatic Entity Recognition and Typing from Massive Text Corpora: A Phrase and Network Mining Approach
    Ren, Xiang
    El-Kishky, Ahmed
    Wang, Chi
    Han, Jiawei
    KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 2319 - 2320
  • [3] AUTOMATIC TEXT TYPING
    WILSON, E
    COMPUTERS AND THE HUMANITIES, 1989, 23 (4-5): : 429 - 442
  • [4] Persian Automatic Text Summarization Based on Named Entity Recognition
    Khademi, Mohammad Ebrahim
    Fakhredanesh, Mohammad
    IRANIAN JOURNAL OF SCIENCE AND TECHNOLOGY-TRANSACTIONS OF ELECTRICAL ENGINEERING, 2020,
  • [5] Automatic Text Summarization using Document Clustering Named Entity Recognition
    Selvan, R. . Senthamizh
    Arutchelvan, K.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (09) : 537 - 543
  • [6] Named Entity Recognition Algorithms Comparison For Judicial Text Data
    Aibek, Kuralbayev
    Bobur, Mukhsimbayev
    Abay, Bekbaganbetov
    Hajiyev, Fuad
    2020 IEEE 14TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2020), 2020,
  • [7] Entity-to-Text based Data Augmentation for various Named Entity Recognition Tasks
    Hu, Xuming
    Jiang, Yong
    Liu, Aiwei
    Huang, Zhongqiang
    Xie, Pengjun
    Huang, Fei
    Wen, Lijie
    Yu, Philip S.
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 9072 - 9087
  • [8] Automatic construction of knowledge graph based on massive text data
    Zhu X.-L.
    Xie Z.
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2021, 51 (04): : 1358 - 1363
  • [9] ENTITY RECOGNITION IN ASSAMESE TEXT
    Mahanta, Nandana
    Dhar, Sourish
    Roy, Sudipta
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND ELECTRONICS SYSTEMS (ICCES), 2016, : 522 - 526
  • [10] Increasing Teachers' Trust in Automatic Text Assessment Through Named-Entity Recognition
    Walter, Candy
    ARTIFICIAL INTELLIGENCE IN EDUCATION: POSTERS AND LATE BREAKING RESULTS, WORKSHOPS AND TUTORIALS, INDUSTRY AND INNOVATION TRACKS, PRACTITIONERS AND DOCTORAL CONSORTIUM, PT II, 2022, 13356 : 191 - 194