An Unsupervised Feature Selection Framework for Social Media Data

被引:48
|
作者
Tang, Jiliang [1 ]
Liu, Huan [1 ]
机构
[1] Arizona State Univ, Dept Comp Sci, Tempe, AZ 85281 USA
基金
美国国家科学基金会;
关键词
Unsupervised feature selection; linked data; social media; pseudo labels; social dimension regularization;
D O I
10.1109/TKDE.2014.2320728
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The explosive usage of social media produces massive amount of unlabeled and high-dimensional data. Feature selection has been proven to be effective in dealing with high-dimensional data for efficient learning and data mining. Unsupervised feature selection remains a challenging task due to the absence of label information based on which feature relevance is often assessed. The unique characteristics of social media data further complicate the already challenging problem of unsupervised feature selection, e. g., social media data is inherently linked, which makes invalid the independent and identically distributed assumption, bringing about new challenges to unsupervised feature selection algorithms. In this paper, we investigate a novel problem of feature selection for social media data in an unsupervised scenario. In particular, we analyze the differences between social media data and traditional attribute-value data, investigate how the relations extracted from linked data can be exploited to help select relevant features, and propose a novel unsupervised feature selection framework, LUFS, for linked social media data. We systematically design and conduct systemic experiments to evaluate the proposed framework on data sets from real-world social media websites. The empirical study demonstrates the effectiveness and potential of our proposed framework.
引用
收藏
页码:2914 / 2927
页数:14
相关论文
共 50 条
  • [41] UNSUPERVISED FEATURE SELECTION BASED ON FEATURE RELEVANCE
    Zhang, Feng
    Zhao, Ya-Jun
    Chen, Jun-Fen
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 487 - +
  • [42] Unsupervised feature selection using feature similarity
    Mitra, P
    Murthy, CA
    Pal, SK
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (03) : 301 - 312
  • [43] A Unified Framework of Latent Feature Learning in Social Media
    Yuan, Zhaoquan
    Sang, Jitao
    Xu, Changsheng
    Liu, Yan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2014, 16 (06) : 1624 - 1635
  • [44] A Feature Generalization Framework for Social Media Popularity Prediction
    Wang, Kai
    Wang, Penghui
    Chen, Xin
    Huang, Qiushi
    Mao, Zhendong
    Zhang, Yongdong
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4570 - 4574
  • [45] Hybrid fast unsupervised feature selection for high-dimensional data
    Manbari, Zhaleh
    AkhlaghianTab, Fardin
    Salavati, Chiman
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 124 : 97 - 118
  • [46] Half-Quadratic Minimization for Unsupervised Feature Selection on Incomplete Data
    Shen, Heng Tao
    Zhu, Yonghua
    Zheng, Wei
    Zhu, Xiaofeng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (07) : 3122 - 3135
  • [47] Unsupervised Feature Selection for Proportional Data Clustering via Expectation Propagation
    Fan, Wentao
    Bouguila, Nizar
    2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
  • [48] Unsupervised feature selection under perturbations: meeting the challenges of biological data
    Varshavsky, Roy
    Gottlieb, Assaf
    Horn, David
    Linial, Michal
    BIOINFORMATICS, 2007, 23 (24) : 3343 - 3349
  • [49] Unsupervised feature selection for biomarker identification in chromatography and gene expression data
    Strickert, Marc
    Sreenivasulu, Nese
    Peterek, Silke
    Weschke, Winfriede
    Mock, Hans-Peter
    Seiffert, Udo
    ARTIFICIAL NEURAL NETWORKS IN PATTERN RECOGNITION, PROCEEDINGS, 2006, 4087 : 274 - 285
  • [50] Multimodality as a criterion for feature selection in unsupervised analysis of gene expression data
    Li, Y
    Sung, WK
    Miller, LD
    BIBE 2005: 5TH IEEE SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, 2005, : 276 - 280