A Text Clustering Approach of Chinese News Based on Neural Network Language Model

被引:0
|
作者
Zhaoxin Fan
Shuoying Chen
Li Zha
Jiadong Yang
机构
[1] Beijing Institute of Technology,School of Computer Science and Technology
[2] Chinese Academy of Sciences,Institute of Computing Technology
[3] Sohu.com Inc,undefined
关键词
Data mining; Fuzzy k-means; Language model; Chinese news;
D O I
暂无
中图分类号
学科分类号
摘要
Text clustering plays an important role in data mining and machine learning. After years of development, clustering technology has produced a series of theories and methods. However, in the text clustering of Chinese news, the mainstream LDA method suffers a high time complex. In order to improve the speed, this paper puts forward a new method in which neural network language model is first applied to text clustering. Text clustering is first converted to its dual problem called word clustering. With neural network language model, we can get word vector which can be used in the fuzzy k-means of the Chinese news keyword set. Based on the keyword clustering result, we can get text clustering result of Chinese news by a single transition. Experiments have show this method’s running speed is five times faster than LDA. This method has been successfully used in the Sohu news recommendation system currently.
引用
收藏
页码:198 / 206
页数:8
相关论文
共 50 条
  • [1] A Text Clustering Approach of Chinese News Based on Neural Network Language Model
    Fan, Zhaoxin
    Chen, Shuoying
    Zha, Li
    Yang, Jiadong
    [J]. INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2016, 44 (01) : 198 - 206
  • [2] A clustering algorithm for Chinese text based on SOM neural network and density
    Meng, ZQ
    Zhu, HC
    Zhu, YH
    Zhou, GG
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2005, PT 2, PROCEEDINGS, 2005, 3497 : 251 - 256
  • [3] An approach to vocabulary expansion for neural network language model by means of hierarchical clustering
    Pavel, Dudarin
    Nadezhda, Yarushkina
    [J]. PROCEEDINGS OF THE 11TH CONFERENCE OF THE EUROPEAN SOCIETY FOR FUZZY LOGIC AND TECHNOLOGY (EUSFLAT 2019), 2019, 1 : 614 - 618
  • [4] A Combined-Convolutional Neural Network for Chinese News Text Classification
    Zhang Y.
    Liu K.-F.
    Zhang Q.-X.
    Wang Y.-G.
    Gao K.-L.
    [J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2021, 49 (06): : 1059 - 1067
  • [5] CHINESE NEWS TEXT CLASSIFICATION ALGORITHM BASED ON ONLINE KNOWLEDGE EXTENSION AND CONVOLUTIONAL NEURAL NETWORK
    He, Chun-Hui
    Zhang, Chong
    Hu, Sheng-Ze
    Tan, Zhen
    Zhu, Hui-Ming
    Ge, Bin
    [J]. 2019 16TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICWAMTIP), 2019, : 204 - 211
  • [6] The Recognition of Chinese Caption Text in News Video Using Convolutional Neural Network
    Zhong, Dixiu
    Shi, Ping
    Pan, Da
    Sha, Yuan
    [J]. PROCEEDINGS OF 2016 IEEE ADVANCED INFORMATION MANAGEMENT, COMMUNICATES, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IMCEC 2016), 2016, : 658 - 662
  • [7] News Text Classification Based on an Improved Convolutional Neural Network
    Tao, Wenjing
    Chang, Dan
    [J]. TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2019, 26 (05): : 1400 - 1409
  • [8] A deep neural network-based approach for fake news detection in regional language
    Katariya, Piyush
    Gupta, Vedika
    Arora, Rohan
    Kumar, Adarsh
    Dhingra, Shreya
    Xin, Qin
    Hemanth, Jude
    [J]. INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2022, 18 (5/6) : 286 - 309
  • [9] Evaluation of Neural Network Language Models In Handwritten Chinese Text Recognition
    Wu, Yi-Chao
    Yin, Fei
    Liu, Cheng-Lin
    [J]. 2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 166 - 170
  • [10] Detecting and Classifying Typhoon Information from Chinese News Based on a Neural Network Model
    Chen, Danjie
    Qin, Fen
    Cai, Kun
    Shen, Yatian
    [J]. SUSTAINABILITY, 2021, 13 (13)