Inside Importance Factors of Graph-Based Keyword Extraction on Chinese Short Text

被引:4
|
作者
Chen, Junjie [1 ,2 ]
Hou, Hongxu [1 ]
Gao, Jing [2 ]
机构
[1] Inner Mongolia Univ, Coll Comp Sci, 235 West Univ Rd, Hohhot 010021, Inner Mongolia, Peoples R China
[2] Inner Mongolia Agr Univ, Coll Comp Sci & Informat Engn, 306 Zhao Wuda Rd, Hohhot 010018, Inner Mongolia, Peoples R China
关键词
Short text; keyword extraction; importance rank; KEYPHRASE EXTRACTION;
D O I
10.1145/3388971
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Keywords are considered to be important words in the text and can provide a concise representation of the text. With the surge of unlabeled short text on the Internet, automatic keyword extraction task has proven useful in other information processing applications. Graph-based approaches are prevalent unsupervised models for this task. However, most of these methods emphasize the importance of the relation between words without considering other importance factors. Furthermore, when measuring the importance of a word in a text, the damping factor is set to 0.85 following PageRank. To the best of our knowledge, there is no existing work investigating the impact of the damping factor on the keyword extraction task. In addition, there are few publicly available labeled Chinese short text datasets for this task. In this article, we investigate the importance parts of words in a given document and propose an improved graph-based method for keyword extraction from short documents. Moreover, we analyze the impact of importance factors on performance. We also provide annotated long and short Chinese datasets for this task. The model is performed on Chinese and English datasets, and results show that our model obtains improvements in performance over the previous unsupervised models on short documents. Comparative experiments show that the damping factor is related to the text length, which is neglected in traditional methods.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] A semantic graph-based keyword extraction model using ranking method on big social data
    Devika, R.
    Subramaniyaswamy, V
    WIRELESS NETWORKS, 2021, 27 (08) : 5447 - 5459
  • [22] A semantic graph-based keyword extraction model using ranking method on big social data
    R. Devika
    V. Subramaniyaswamy
    Wireless Networks, 2021, 27 : 5447 - 5459
  • [23] Keyword Extraction from Short Texts with a Text-to-Text Transfer Transformer
    Pezik, Piotr
    Mikolajczyk, Agnieszka
    Wawrzynski, Adam
    Niton, Bartlomiej
    Ogrodniczuk, Maciej
    RECENT CHALLENGES IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2022, 2022, 1716 : 530 - 542
  • [24] SIFRANK Algorithm for Chinese Text Keyword Extraction Based on Dependent Semantic Feature Constraints
    Zhang, Qian
    Wang, Tiancheng
    Zhu, Mengyuan
    Shen, Tao
    Zhao, Yilin
    Zhang, Yunwei
    2022 IEEE 17TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2022, : 1652 - 1657
  • [25] Ensembles for Graph-based Keyword Spotting in Historical Handwritten Documents
    Stauffer, Michael
    Fischer, Andreas
    Riesen, Kaspar
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 714 - 720
  • [26] Filters for graph-based keyword spotting in historical handwritten documents
    Stauffer, Michael
    Fischer, Andreas
    Riesen, Kaspar
    PATTERN RECOGNITION LETTERS, 2020, 134 : 125 - 134
  • [27] VisualTextRank: Unsupervised Graph-based Content Extraction for Automating Ad Text to Image Search
    Mishra, Shaunak
    Kuznetsov, Mikhail
    Srivastava, Gaurav
    Sviridenko, Maxim
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 3404 - 3413
  • [28] Dependency graph for short text extraction and summarization
    Franciscus, Nigel
    Ren, Xuguang
    Stantic, Bela
    JOURNAL OF INFORMATION AND TELECOMMUNICATION, 2019, 3 (04) : 413 - 429
  • [29] A Graph-Based Measurement for Text Imbalance Classification
    Tian, Jiachen
    Chen, Shizhan
    Zhang, Xiaowang
    Feng, Zhiyong
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2188 - 2195
  • [30] Graph-based Text Representation for Malay Translated Hadith Text
    Alias, Nursyahidah
    Abd Rahman, Nurazzah
    Ismail, Normaly Kamal
    Nor, Zulhilmi Mohamed
    Alias, Muhammad Nazir
    2016 THIRD INTERNATIONAL CONFERENCE ON INFORMATION RETRIEVAL AND KNOWLEDGE MANAGEMENT (CAMP), 2016, : 60 - 66