Inside Importance Factors of Graph-Based Keyword Extraction on Chinese Short Text

被引:4
|
作者
Chen, Junjie [1 ,2 ]
Hou, Hongxu [1 ]
Gao, Jing [2 ]
机构
[1] Inner Mongolia Univ, Coll Comp Sci, 235 West Univ Rd, Hohhot 010021, Inner Mongolia, Peoples R China
[2] Inner Mongolia Agr Univ, Coll Comp Sci & Informat Engn, 306 Zhao Wuda Rd, Hohhot 010018, Inner Mongolia, Peoples R China
关键词
Short text; keyword extraction; importance rank; KEYPHRASE EXTRACTION;
D O I
10.1145/3388971
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Keywords are considered to be important words in the text and can provide a concise representation of the text. With the surge of unlabeled short text on the Internet, automatic keyword extraction task has proven useful in other information processing applications. Graph-based approaches are prevalent unsupervised models for this task. However, most of these methods emphasize the importance of the relation between words without considering other importance factors. Furthermore, when measuring the importance of a word in a text, the damping factor is set to 0.85 following PageRank. To the best of our knowledge, there is no existing work investigating the impact of the damping factor on the keyword extraction task. In addition, there are few publicly available labeled Chinese short text datasets for this task. In this article, we investigate the importance parts of words in a given document and propose an improved graph-based method for keyword extraction from short documents. Moreover, we analyze the impact of importance factors on performance. We also provide annotated long and short Chinese datasets for this task. The model is performed on Chinese and English datasets, and results show that our model obtains improvements in performance over the previous unsupervised models on short documents. Comparative experiments show that the damping factor is related to the text length, which is neglected in traditional methods.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] FNG-IE: an improved graph-based method for keyword extraction from scholarly big-data
    Tahir, Noman
    Asif, Muhammad
    Ahmad, Shahbaz
    Malik, Muhammad Sheraz Arshad
    Aljuaid, Hanan
    Butt, Muhammad Arif
    Rehman, Mobashar
    PEERJ COMPUTER SCIENCE, 2021, PeerJ Inc. (07) : 1 - 24
  • [42] A Feature Extraction Method Using Base Phrase and keyword In Chinese Text
    Li, Xin-fu
    Zhao, Lei-lei
    Wu, Li-hong
    2008 3RD INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEM AND KNOWLEDGE ENGINEERING, VOLS 1 AND 2, 2008, : 680 - +
  • [43] An Improved Focused Crawler Based on Text Keyword Extraction
    Zheng, Zhang
    Qian, Du
    PROCEEDINGS OF 2016 5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2016, : 386 - 390
  • [44] Chinese keyword extraction based on word platform
    Jiao, Hui
    Liu, Qian
    Jia, Hui-bo
    FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2007, : 360 - +
  • [45] The Research of Chinese Short-text Classification Based on Domain Keyword Set Extension and HowNet
    Li, Xiangdong
    Gao, Fan
    Ding, Cong
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON INTELLIGENT CONTROL AND COMPUTER APPLICATION, 2016, 30 : 244 - 247
  • [46] Implementation of a framework for graph-based keyword search over relational data
    Cozza V.
    International Journal of Intelligent Information and Database Systems, 2023, 16 (01) : 62 - 88
  • [47] Speeding-Up Graph-Based Keyword Spotting by Quadtree Segmentations
    Stauffer, Michael
    Fischer, Andreas
    Riesen, Kaspar
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, 2017, 10424 : 304 - 315
  • [48] A Graph-Based Approach for Sentiment Sentence Extraction
    Shimada, Kazutaka
    Hashimoto, Daigo
    Endo, Tsutomu
    NEW FRONTIERS IN APPLIED DATA MINING, 2009, 5433 : 38 - 48
  • [49] A Graph-based Approach of Automatic Keyphrase Extraction
    Yan Ying
    Tan Qingping
    Xie Qinzheng
    Zeng Ping
    Li Panpan
    ADVANCES IN INFORMATION AND COMMUNICATION TECHNOLOGY, 2017, 107 : 248 - 255
  • [50] Sequential graph-based extraction of curvilinear structures
    Shuaa S. Alharbi
    Chris G. Willcocks
    Philip T. G. Jackson
    Haifa F. Alhasson
    Boguslaw Obara
    Signal, Image and Video Processing, 2019, 13 : 941 - 949