Inside Importance Factors of Graph-Based Keyword Extraction on Chinese Short Text

被引:4
|
作者
Chen, Junjie [1 ,2 ]
Hou, Hongxu [1 ]
Gao, Jing [2 ]
机构
[1] Inner Mongolia Univ, Coll Comp Sci, 235 West Univ Rd, Hohhot 010021, Inner Mongolia, Peoples R China
[2] Inner Mongolia Agr Univ, Coll Comp Sci & Informat Engn, 306 Zhao Wuda Rd, Hohhot 010018, Inner Mongolia, Peoples R China
关键词
Short text; keyword extraction; importance rank; KEYPHRASE EXTRACTION;
D O I
10.1145/3388971
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Keywords are considered to be important words in the text and can provide a concise representation of the text. With the surge of unlabeled short text on the Internet, automatic keyword extraction task has proven useful in other information processing applications. Graph-based approaches are prevalent unsupervised models for this task. However, most of these methods emphasize the importance of the relation between words without considering other importance factors. Furthermore, when measuring the importance of a word in a text, the damping factor is set to 0.85 following PageRank. To the best of our knowledge, there is no existing work investigating the impact of the damping factor on the keyword extraction task. In addition, there are few publicly available labeled Chinese short text datasets for this task. In this article, we investigate the importance parts of words in a given document and propose an improved graph-based method for keyword extraction from short documents. Moreover, we analyze the impact of importance factors on performance. We also provide annotated long and short Chinese datasets for this task. The model is performed on Chinese and English datasets, and results show that our model obtains improvements in performance over the previous unsupervised models on short documents. Comparative experiments show that the damping factor is related to the text length, which is neglected in traditional methods.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] A Way to Improve Graph-Based Keyword Extraction
    Cao, Jian
    Jiang, Zhiheng
    Huang, May
    Wang, Karl
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2015, : 166 - 170
  • [2] An Overview of Graph-Based Keyword Extraction Methods and Approaches
    Beliga, Slobodan
    Mestrovic, Ana
    Martincic-Ipsic, Sanda
    JOURNAL OF INFORMATION AND ORGANIZATIONAL SCIENCES, 2015, 39 (01) : 1 - 20
  • [3] A multi-centrality index for graph-based keyword extraction
    Vega-Olivero, Didier A.
    Gomes, Pedro Spoljaric
    Milios, Evangelos E.
    Berton, Lilian
    INFORMATION PROCESSING & MANAGEMENT, 2019, 56 (06)
  • [4] A Graph-Based Keyword Extraction Method for Academic Literature Knowledge Graph Construction
    Zhang, Lin
    Li, Yanan
    Li, Qinru
    MATHEMATICS, 2024, 12 (09)
  • [5] Chinese Automatic Text Summarization Based on Keyword Extraction
    Jiang Xiao-yu
    FIRST INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, : 225 - 228
  • [6] An Unsupervised Keyword Extraction Method based on Text Semantic Graph
    Zhao, Liujun
    Miao, Zhongquan
    Wang, Chunming
    Kong, Weizheng
    2022 IEEE 6TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2022, : 1431 - 1436
  • [7] Improving the performance of semantic graph-based keyword extraction and text summarization using fuzzy relations in Hindi Wordnet
    Joshi, Manju Lata
    Mittal, Namita
    Joshi, Nisheeth
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (03) : 3771 - 3788
  • [8] A Novel Graph-Based Ensemble Token Classification Model for Keyword Extraction
    Hüma Kılıç
    Aydın Çetin
    Arabian Journal for Science and Engineering, 2023, 48 : 10673 - 10680
  • [9] A Novel Graph-Based Ensemble Token Classification Model for Keyword Extraction
    Kilic, Huma
    Cetin, Aydin
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (08) : 10673 - 10680
  • [10] Graph-Based Short Text Clustering via Contrastive Learning with Graph Embedding
    Wei, Yujie
    Zhou, Weidong
    Zhou, Jin
    Wang, Yingxu
    Han, Shiyuan
    Du, Tao
    Yang, Cheng
    Liu, Bowen
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT I, 2023, 14086 : 727 - 738