UCPhrase: Unsupervised Context-aware Quality Phrase Tagging

被引:10
|
作者
Gu, Xiaotao [1 ]
Wang, Zihan [2 ]
Bi, Zhenyu [2 ]
Meng, Yu [1 ]
Liu, Liyuan [1 ]
Han, Jiawei [1 ]
Shang, Jingbo [2 ]
机构
[1] Univ Illinois, Champaign, IL 61820 USA
[2] Univ Calif San Diego, La Jolla, CA 92093 USA
基金
美国国家科学基金会;
关键词
phrase mining; language models; unsupervised method;
D O I
10.1145/3447548.3467397
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Identifying and understanding quality phrases from context is a fundamental task in text mining. The most challenging part of this task arguably lies in uncommon, emerging, and domain-specific phrases. The infrequent nature of these phrases significantly hurts the performance of phrase mining methods that rely on sufficient phrase occurrences in the input corpus. Context-aware tagging models, though not restricted by frequency, heavily rely on domain experts for either massive sentence-level gold labels or handcrafted gazetteers. In this work, we propose UCPhrase, a novel unsupervised context-aware quality phrase tagger. Specifically, we induce high-quality phrase spans as silver labels from consistently co-occurring word sequences within each document. Compared with typical context-agnostic distant supervision based on existing knowledge bases (KBs), our silver labels root deeply in the input domain and context, thus having unique advantages in preserving contextual completeness and capturing emerging, out-of-KB phrases. Training a conventional neural tagger based on silver labels usually faces the risk of overfitting phrase surface names. Alternatively, we observe that the contextualized attention maps generated from a Transformer-based neural language model effectively reveal the connections between words in a surface-agnostic way. Therefore, we pair such attention maps with the silver labels to train a lightweight span prediction model, which can be applied to new input to recognize (unseen) quality phrases regardless of their surface names or frequency. Thorough experiments on various tasks and datasets, including corpus-level phrase ranking, document-level keyphrase extraction, and sentence-level phrase tagging, demonstrate the superiority of our design over state-of-the-art pre-trained, unsupervised, and distantly supervised methods.
引用
收藏
页码:478 / 486
页数:9
相关论文
共 50 条
  • [1] Language model as an Annotator: Unsupervised context-aware quality phrase generation
    Zhang, Zhihao
    Zuo, Yuan
    Lin, Chenghua
    Wu, Junjie
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 283
  • [2] Context-Aware Unsupervised Text Stylization
    Yang, Shuai
    Liu, Jiaying
    Yang, Wenhan
    Guo, Zongming
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1688 - 1696
  • [3] Context-Aware Phrase Representation for Statistical Machine Translation
    Ruan, Zhiwei
    Su, Jinsong
    Xiong, Deyi
    Ji, Rongrong
    [J]. PRICAI 2018: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2018, 11012 : 137 - 149
  • [4] Towards context-aware environments: A framework for tagging context awareness
    Nava-Diaz, Salvador W.
    Chavira, Gabriel
    Regalado, Jorge
    Quiroga, Gerardo
    Pichardo, Roberto
    [J]. 2013 8TH COMPUTING COLOMBIAN CONFERENCE (8CCC), 2013, : 115 - 120
  • [5] Enhancing Personal Learning Environments by Context-Aware Tagging
    Cao, Yiwei
    Kovachev, Dejan
    Klamma, Ralf
    Lau, Rynson W. H.
    [J]. ADVANCES IN WEB-BASED LEARNING-ICWL 2010, 2010, 6483 : 11 - +
  • [6] Automated Context-Aware Phrase Mining from Text Corpora
    Zhang, Xue
    Li, Qinghua
    Li, Cuiping
    Chen, Hong
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II, 2021, 12682 : 20 - 36
  • [7] Unsupervised Learning of Paragraph Embeddings for Context-Aware Recommendation
    Xie, Jin
    Zhu, Fuxi
    Huang, Minxue
    Xiong, Naixue
    Huang, Sheng
    Xiong, Wei
    [J]. IEEE ACCESS, 2019, 7 : 43100 - 43109
  • [8] Unsupervised Context Extraction via Region Embedding for Context-Aware Recommendations
    Sitkrongwong, Padipat
    Takasu, Atsuhiro
    [J]. IDEAS '19: PROCEEDINGS OF THE 23RD INTERNATIONAL DATABASE APPLICATIONS & ENGINEERING SYMPOSIUM (IDEAS 2019), 2019, : 123 - 132
  • [9] Towards context-aware media recommendation based on social tagging
    Alhamid, Mohammed F.
    Rawashdeh, Majdi
    Hossain, M. Anwar
    Alelaiwi, Abdulhameed
    El Saddik, Abdulmotaleb
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2016, 46 (03) : 499 - 516
  • [10] Towards context-aware media recommendation based on social tagging
    Mohammed F. Alhamid
    Majdi Rawashdeh
    M. Anwar Hossain
    Abdulhameed Alelaiwi
    Abdulmotaleb El Saddik
    [J]. Journal of Intelligent Information Systems, 2016, 46 : 499 - 516