ConPhrase: Enhancing Context-Aware Phrase Mining From Text Corpora

被引:0
|
作者
Zhang, Xue [1 ]
Li, Qinghua [1 ]
Li, Cuiping [1 ]
Chen, Hong [1 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing 100872, Peoples R China
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
Data mining; Noise measurement; Relational databases; Semantics; Labeling; Task analysis; Hypertension; Information extraction; phrase mining; quality phrase recognition; RECOGNITION;
D O I
10.1109/TKDE.2022.3193126
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Phrase mining is an essential step when transforming unstructured text into structured information, in which the aim is to extract high-quality phrases from given corpora automatically. Existing statistics-based methods have achieved state-of-the-art performance on this task. However, such methods often rely heavily on statistical signals to extract quality phrases, ignoring the effect of contextual information. In this paper, we propose a novel context-aware method, called ConPhrase, for quality phrase mining under distantly supervised settings. Specifically, ConPhrase formulates phrase mining as a sequence labeling problem by considering local contextual information, and also incorporates distant supervision methods to automatically generate labeled data. It comprises two modules designed to tackle global information scarcity and noisy data filtration: 1) a topic-aware phrase recognition network that incorporates domain-related topic information into word representation learning to identify quality phrases effectively; 2) an instance selection network that focuses on choosing correct sentences with reinforcement learning for improving the prediction performance of the phrase recognition network. Moreover, we also propose an extended variant of ConPhrase, called ConPhrase+, that further enhances phrase recognition by utilizing document-level contextual information across sentences within the entire document. Experimental results show that contextual information is indispensable for phrase mining and our context-aware methods perform significantly better than state-of-the-art approaches on three publicly available datasets.
引用
收藏
页码:6767 / 6783
页数:17
相关论文
共 50 条
  • [1] Automated Context-Aware Phrase Mining from Text Corpora
    Zhang, Xue
    Li, Qinghua
    Li, Cuiping
    Chen, Hong
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II, 2021, 12682 : 20 - 36
  • [2] Scalable Topical Phrase Mining from Text Corpora
    El-Kishky, Ahmed
    Song, Yanglei
    Wang, Chi
    Voss, Clare R.
    Han, Jiawei
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 8 (03): : 305 - 316
  • [3] Automated Phrase Mining from Massive Text Corpora
    Shang, Jingbo
    Liu, Jialu
    Jiang, Meng
    Ren, Xiang
    Voss, Clare R.
    Han, Jiawei
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (10) : 1825 - 1837
  • [4] A Context-Aware Recommender Method Based on Text Mining
    Sundermann, Camila Vaccari
    de Padua, Renan
    Tonon, Vitor Rodrigues
    Domingues, Marcos Aurelio
    Rezende, Solange Oliveira
    [J]. PROGRESS IN ARTIFICIAL INTELLIGENCE, PT II, 2019, 11805 : 385 - 396
  • [5] A context-aware recommender method based on text and opinion mining
    Sundermann, Camila Vaccari
    de Padua, Renan
    Tonon, Vitor Rodrigues
    Marcacini, Ricardo Marcondes
    Domingues, Marcos Aurelio
    Rezende, Solange Oliveira
    [J]. EXPERT SYSTEMS, 2020, 37 (06)
  • [6] Context-Aware Phrase Representation for Statistical Machine Translation
    Ruan, Zhiwei
    Su, Jinsong
    Xiong, Deyi
    Ji, Rongrong
    [J]. PRICAI 2018: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2018, 11012 : 137 - 149
  • [7] UCPhrase: Unsupervised Context-aware Quality Phrase Tagging
    Gu, Xiaotao
    Wang, Zihan
    Bi, Zhenyu
    Meng, Yu
    Liu, Liyuan
    Han, Jiawei
    Shang, Jingbo
    [J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 478 - 486
  • [8] Context-Aware Unsupervised Text Stylization
    Yang, Shuai
    Liu, Jiaying
    Yang, Wenhan
    Guo, Zongming
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1688 - 1696
  • [9] Context-aware Argumentative Relation Mining
    Nguyen, Huy V.
    Litman, Diane J.
    [J]. PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1127 - 1137
  • [10] Context-aware Outstanding Fact Mining from Knowledge Graphs
    Yang, Yueji
    Li, Yuchen
    Karras, Panagiotis
    Tung, Anthony K. H.
    [J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 2006 - 2016