An Unsupervised Feature Selection Framework for Social Media Data

被引:48
|
作者
Tang, Jiliang [1 ]
Liu, Huan [1 ]
机构
[1] Arizona State Univ, Dept Comp Sci, Tempe, AZ 85281 USA
基金
美国国家科学基金会;
关键词
Unsupervised feature selection; linked data; social media; pseudo labels; social dimension regularization;
D O I
10.1109/TKDE.2014.2320728
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The explosive usage of social media produces massive amount of unlabeled and high-dimensional data. Feature selection has been proven to be effective in dealing with high-dimensional data for efficient learning and data mining. Unsupervised feature selection remains a challenging task due to the absence of label information based on which feature relevance is often assessed. The unique characteristics of social media data further complicate the already challenging problem of unsupervised feature selection, e. g., social media data is inherently linked, which makes invalid the independent and identically distributed assumption, bringing about new challenges to unsupervised feature selection algorithms. In this paper, we investigate a novel problem of feature selection for social media data in an unsupervised scenario. In particular, we analyze the differences between social media data and traditional attribute-value data, investigate how the relations extracted from linked data can be exploited to help select relevant features, and propose a novel unsupervised feature selection framework, LUFS, for linked social media data. We systematically design and conduct systemic experiments to evaluate the proposed framework on data sets from real-world social media websites. The empirical study demonstrates the effectiveness and potential of our proposed framework.
引用
收藏
页码:2914 / 2927
页数:14
相关论文
共 50 条
  • [31] An Embedded Feature Selection Framework for Hybrid Data
    Boroujeni, Forough Rezaei
    Stantic, Bela
    Wang, Sen
    DATABASES THEORY AND APPLICATIONS, ADC 2017, 2017, 10538 : 138 - 150
  • [32] A novel feature selection framework for incomplete data
    Guo, Cong
    Yang, Wei
    Li, Zheng
    Liu, Chun
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2024, 252
  • [33] Semi-Supervised Feature Selection with Universum Based on Linked Social Media Data
    Qiu, Junyang
    Wang, Yibing
    Pan, Zhisong
    Jia, Bo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (09): : 2522 - 2525
  • [34] Graph-Based Semi-supervised Feature Selection for Social Media Data
    Wang, Na
    Liu, Zhihui
    Li, Xia
    FOUNDATIONS OF INTELLIGENT SYSTEMS (ISKE 2013), 2014, 277 : 115 - 124
  • [35] Embedded Unsupervised Feature Selection
    Wang, Suhang
    Tang, Jiliang
    Liu, Huan
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 470 - 476
  • [36] Feature selection for unsupervised learning
    Dy, JG
    Brodley, CE
    JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 5 : 845 - 889
  • [37] Unsupervised Personalized Feature Selection
    Li, Jundong
    Wu, Liang
    Dani, Harsh
    Liu, Huan
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3514 - 3521
  • [38] Feature Selection for Unsupervised Learning
    Adhikary, Jyoti Ranjan
    Murty, M. Narasimha
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT III, 2012, 7665 : 382 - 389
  • [39] A Novel Unsupervised Feature Selection Method for Bioinformatics Data Sets through Feature Clustering
    Li, Guangrong
    Hu, Xiaohua
    Shen, Xiajiong
    Chen, Xin
    Li, Zhoujun
    2008 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2, 2008, : 41 - +
  • [40] Feature weighting as a tool for unsupervised feature selection
    Panday, Deepak
    de Amorim, Renato Cordeiro
    Lane, Peter
    INFORMATION PROCESSING LETTERS, 2018, 129 : 44 - 52