Inside Importance Factors of Graph-Based Keyword Extraction on Chinese Short Text

被引:4
|
作者
Chen, Junjie [1 ,2 ]
Hou, Hongxu [1 ]
Gao, Jing [2 ]
机构
[1] Inner Mongolia Univ, Coll Comp Sci, 235 West Univ Rd, Hohhot 010021, Inner Mongolia, Peoples R China
[2] Inner Mongolia Agr Univ, Coll Comp Sci & Informat Engn, 306 Zhao Wuda Rd, Hohhot 010018, Inner Mongolia, Peoples R China
关键词
Short text; keyword extraction; importance rank; KEYPHRASE EXTRACTION;
D O I
10.1145/3388971
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Keywords are considered to be important words in the text and can provide a concise representation of the text. With the surge of unlabeled short text on the Internet, automatic keyword extraction task has proven useful in other information processing applications. Graph-based approaches are prevalent unsupervised models for this task. However, most of these methods emphasize the importance of the relation between words without considering other importance factors. Furthermore, when measuring the importance of a word in a text, the damping factor is set to 0.85 following PageRank. To the best of our knowledge, there is no existing work investigating the impact of the damping factor on the keyword extraction task. In addition, there are few publicly available labeled Chinese short text datasets for this task. In this article, we investigate the importance parts of words in a given document and propose an improved graph-based method for keyword extraction from short documents. Moreover, we analyze the impact of importance factors on performance. We also provide annotated long and short Chinese datasets for this task. The model is performed on Chinese and English datasets, and results show that our model obtains improvements in performance over the previous unsupervised models on short documents. Comparative experiments show that the damping factor is related to the text length, which is neglected in traditional methods.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Graph-based extractive text summarization method for Hausa text
    Bichi, Abdulkadir Abubakar
    Samsudin, Ruhaidah
    Hassan, Rohayanti
    Hasan, Layla Rasheed Abdallah
    Rogo, Abubakar Ado
    PLOS ONE, 2023, 18 (05):
  • [32] New Graph-Based Text Summarization Method
    alZahir, Saif
    Fatima, Qandeel
    Cenek, Martin
    2015 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2015, : 396 - 401
  • [33] A Graph-based Approach to Text Genre Analysis
    Nabhan, Ahmed Ragab
    Shaalan, Khaled
    COMPUTACION Y SISTEMAS, 2016, 20 (03): : 527 - 539
  • [34] Graph-based Arabic text semantic representation
    Etaiwi, Wael
    Awajan, Arafat
    INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (03)
  • [35] Graph-Based Term Weighting for Text Categorization
    Malliaros, Fragkiskos D.
    Skianis, Konstantinos
    PROCEEDINGS OF THE 2015 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2015), 2015, : 1473 - 1479
  • [36] Graph-based abstractive biomedical text summarization
    Givchi, Azadeh
    Ramezani, Reza
    Baraani-Dastjerdi, Ahmad
    JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 132
  • [37] Graph-based Text Representation and Knowledge Discovery
    Jin, Wei
    Srihari, Rohini K.
    APPLIED COMPUTING 2007, VOL 1 AND 2, 2007, : 807 - 811
  • [38] Graph-based ensemble method for text line segmentation in offline Chinese handwritten documents
    Huang, L. (huangliang1576@gmail.com), 1600, Huazhong University of Science and Technology (42):
  • [39] Graph-based Text Classification by Contrastive Learning with Text-level Graph Augmentation
    Li, Ximing
    Wang, Bing
    Wang, Yang
    Wang, Meng
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (04)
  • [40] RoundTripRank: Graph-based Proximity with Importance and Specificity
    Fang, Yuan
    Chang, Kevin Chen-Chuan
    Lauw, Hady W.
    2013 IEEE 29TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2013, : 613 - 624