Microblog Dimensionality Reduction-A Deep Learning Approach

被引:13
|
作者
Xu, Lei [1 ]
Jiang, Chunxiao [2 ]
Ren, Yong [2 ]
Chen, Hsiao-Hwa [3 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[2] Tsinghua Univ, Dept Elect Engn, Beijing 100084, Peoples R China
[3] Natl Cheng Kung Univ, Dept Engn Sci, Tainan 70101, Taiwan
基金
中国博士后科学基金;
关键词
Microblog mining; dimension reduction; text representation; semantic relatedness; deep autoencoder;
D O I
10.1109/TKDE.2016.2540639
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Exploring potentially useful information from huge amount of textual data produced by microblogging services has attracted much attention in recent years. An important preprocessing step of microblog text mining is to convert natural language texts into proper numerical representations. Due to the short-length characteristics of microblog texts, using term frequency vectors to represent microblog texts will cause "sparse data" problem. Finding proper representations of microblog texts is a challenging issue. In this paper, we apply deep networks to map the high-dimensional representations of microblog texts to low-dimensional representations. To improve the result of dimensionality reduction, we take advantage of the semantic similarity derived from two types of microblog-specific information, namely the retweet relationship and hashtags. Two types of approaches, including modifying training data and modifying the training objective of deep networks, are proposed to make use of microblog-specific information. Experiment results show that the deep models perform better than traditional dimensionality reduction methods such as latent semantic analysis and latent Dirichlet allocation topic model, and the use of microblog-specific information can help to learn better representations.
引用
收藏
页码:1779 / 1789
页数:11
相关论文
共 50 条
  • [31] A Dimensionality Reduction Approach for Machine Learning Based IoT Botnet Detection
    Susanto
    Stiawan, Deris
    Arifin, M. Agus Syamsul
    Rejito, Juli
    Idris, Mohd. Yazid
    Budiarto, Rahmat
    [J]. 2021 8TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, COMPUTERSCIENCE AND INFORMATICS (EECSI) 2021, 2021, : 26 - 30
  • [32] Intelligent Spectrum Sensing: An Unsupervised Learning Approach Based on Dimensionality Reduction
    Khalek, Nada Abdel
    Hamouda, Walaa
    [J]. IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, : 171 - 176
  • [33] A novel dimensionality reduction approach by integrating dynamics theory and machine learning
    Chen, Xiyuan
    Wang, Qiubao
    [J]. MATHEMATICS AND COMPUTERS IN SIMULATION, 2024, 218 : 98 - 111
  • [34] Deep learning based emotion analysis of microblog texts
    Xu, Dongliang
    Tian, Zhihong
    Lai, Rufeng
    Kong, Xiangtao
    Tan, Zhiyuan
    Shi, Wei
    [J]. INFORMATION FUSION, 2020, 64 : 1 - 11
  • [35] Online Reviews Analysis for Customer Segmentation through Dimensionality Reduction and Deep Learning Techniques
    Mehrbakhsh Nilashi
    Sarminah Samad
    Behrouz Minaei-Bidgoli
    Fahad Ghabban
    Eko Supriyanto‬
    [J]. Arabian Journal for Science and Engineering, 2021, 46 : 8697 - 8709
  • [36] Dimensionality Reduction in Deep Learning for Chest X-Ray Analysis of Lung Cancer
    Gang, Peng
    Zhen, Wang
    Zeng, Wei
    Gordienko, Yuri
    Kochura, Yuriy
    Alienin, Oleg
    Rokovyi, Oleksandr
    Stirenko, Sergii
    [J]. PROCEEDINGS OF 2018 TENTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2018, : 878 - 883
  • [37] Multiple Kernel Learning for Dimensionality Reduction
    Lin, Yen-Yu
    Liu, Tyng-Luh
    Fuh, Chiou-Shann
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (06) : 1147 - 1160
  • [38] Online Reviews Analysis for Customer Segmentation through Dimensionality Reduction and Deep Learning Techniques
    Nilashi, Mehrbakhsh
    Samad, Sarminah
    Minaei-Bidgoli, Behrouz
    Ghabban, Fahad
    Supriyanto, Eko
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2021, 46 (09) : 8697 - 8709
  • [39] A comparative dimensionality reduction study in telecom customer segmentation using deep learning and PCA
    Maha Alkhayrat
    Mohamad Aljnidi
    Kadan Aljoumaa
    [J]. Journal of Big Data, 7
  • [40] A comparative dimensionality reduction study in telecom customer segmentation using deep learning and PCA
    Alkhayrat, Maha
    Aljnidi, Mohamad
    Aljoumaa, Kadan
    [J]. JOURNAL OF BIG DATA, 2020, 7 (01)