News Keyword Extraction Algorithm Based on Semantic Clustering and Word Graph Model

被引:3
|
作者
Ao Xiong [1 ]
Derong Liu [1 ]
Hongkang Tian [1 ]
Zhengyuan Liu [1 ]
Peng Yu [1 ]
Michel Kadoch [2 ]
机构
[1] State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications
[2] Ecole de Technologie Superieure,Universitedu Quebec
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP391.1 [文字信息处理];
学科分类号
081203 ; 0835 ;
摘要
The internet is an abundant source of news every day. Thus, efficient algorithms to extract keywords from the text are important to obtain information quickly. However, the precision and recall of mature keyword extraction algorithms need improvement. TextRank, which is derived from the PageRank algorithm, uses word graphs to spread the weight of words. The keyword weight propagation in Text Rank focuses only on word frequency. To improve the performance of the algorithm, we propose Semantic Clustering TextRank(SCTR), a semantic clustering news keyword extraction algorithm based on TextRank. Firstly, the word vectors generated by the Bidirectional Encoder Representation from Transformers(BERT) model are used to perform k-means clustering to represent semantic clustering. Then, the clustering results are used to construct a TextRank weight transfer probability matrix. Finally,iterative calculation of word graphs and extraction of keywords are performed. The test target of this experiment is a Chinese news library. The results of the experiment conducted on this text set show that the SCTR algorithm has greater precision, recall, and F1 value than the traditional TextRank and Term Frequency-Inverse Document Frequency(TF-IDF) algorithms.
引用
收藏
页码:886 / 893
页数:8
相关论文
共 50 条
  • [1] News Keyword Extraction Algorithm Based on Semantic Clustering and Word Graph Model
    Xiong, Ao
    Liu, Derong
    Tian, Hongkang
    Liu, Zhengyuan
    Yu, Peng
    Kadoch, Michel
    TSINGHUA SCIENCE AND TECHNOLOGY, 2021, 26 (06) : 886 - 893
  • [2] Document keyword extraction based on semantic hierarchical graph model
    Zhang, Tingting
    Lee, Baozhen
    Zhu, Qinghua
    Han, Xi
    Chen, Ke
    SCIENTOMETRICS, 2023, 128 (05) : 2623 - 2647
  • [3] Document keyword extraction based on semantic hierarchical graph model
    Tingting Zhang
    Baozhen Lee
    Qinghua Zhu
    Xi Han
    Ke Chen
    Scientometrics, 2023, 128 : 2623 - 2647
  • [4] A News Event Clustering Algorithm based on Semantic Relationship Graph
    Liu Zhikang
    Cheng Chunling
    2018 SIXTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD), 2018, : 100 - 105
  • [5] Short Text Clustering based on Word Semantic Graph with Word Embedding Model
    Jinarat, Supakpong
    Manaskasemsak, Bundit
    Rungsawang, Arnon
    2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 1427 - 1432
  • [6] Unsupervised Keyword Extraction Methods Based on a Word Graph Network
    Wang, Hongbin
    Ye, Jingzhen
    Yu, Zhengtao
    Wang, Jian
    Mao, Cunli
    INTERNATIONAL JOURNAL OF AMBIENT COMPUTING AND INTELLIGENCE, 2020, 11 (02) : 68 - 79
  • [8] An Unsupervised Keyword Extraction Method based on Text Semantic Graph
    Zhao, Liujun
    Miao, Zhongquan
    Wang, Chunming
    Kong, Weizheng
    2022 IEEE 6TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2022, : 1431 - 1436
  • [9] Text Clustering Algorithm Based on the Graph Structures of Semantic Word Co-occurrence
    Jin, Chun-Xia
    Bai, Qiu-Chan
    2016 INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND ARTIFICIAL INTELLIGENCE (ISAI 2016), 2016, : 497 - 502
  • [10] Implementing Graph Based Rank on Online News Media Keyword Extraction
    Syafiandini, Arida Ferti
    Mustika, Hani Febri
    Manik, Lindung Parningotan
    Rianto, Yan
    Akbar, Zaenal
    2019 INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL, INFORMATICS AND ITS APPLICATIONS (IC3INA), 2019, : 108 - 113