Unsupervised Feature Selection Algorithm for Dynamic Network Media Data Based on User Correlation

被引:0
|
作者
Ren Y.-G. [1 ]
Wang Y.-L. [1 ]
Liu Y. [1 ]
Zhang J. [1 ]
机构
[1] Department of Computer and Information Technology, Liaoning Normal University, Dalian, 116029, Liaoning
来源
基金
中国国家自然科学基金;
关键词
Correlation; Dynamic network media data; Gradient descent; Tie strength; Unsupervised feature selection;
D O I
10.11897/SP.J.1016.2018.01517
中图分类号
学科分类号
摘要
With the rapid development of the mobile network and social media, more and more Internet multi-media data including texture, image, video and others produce continuously at all times, meanwhile, requirements that learn and apply such data have growth. However, feature calculation and classification efficiency are severely limited, because of the high-dimensional, the complex content and dynamic updating characteristics of Internet multi-media data. Moreover, traditional algorithms mainly solve the feature extraction and classification problem for static multi-media data, and these algorithms require that data format need to conform the specific standard. Aiming to above problems, we proposed an efficient unsupervised feature selection algorithm based on user correlation that is called by UFSDUC (Unsupervised Feature Selection Algorithm for Dynamic Network Media Based on User Correlation) to ensure the feature extraction in real time for the dynamic multi-media data. Firstly, we analyzed user relationships in social networks, and combine the potential social factor to abstract three kinds of relational models including MFS(Multi-user Follow Same user), SFM(Same user Follow Multi-user), FEO(Follow Each Other). Take such models as the constraint condition for the unsupervised feature selection processing. Secondly, we use Laplace operator with the strength of relationship between users to building the relationship model, and then the lagrangian multiplier method is utilized to obtain the mathematical expression of the optimal relationship in the feature model. Moreover, in the proposed algorithm quantifies the strength of between users, which the more strength of the correlation may be gets the more similar information of the feature of between users. Therefore, our algorithm achieved the optimum solution for the multi-media data of the social network. Finally, we set the threshold of the multi-media data of the social network by utilizing the gradient descent method. This threshold is used to obtain the nonzero feature value, and then update the best subset of features to achieve the efficient performance to classify the multi-media data of the social network. In this paper, contributions of the proposed algorithm can be summarized as follows: (1) different traditional feature select algorithms that each sample need get the classification label, the proposed unsupervised feature selection algorithm can define the feature relationship according to different standards without labeling samples, for instance, the similarity of between samples and the distribution of the local information; (2) the correlative information of users is more stable than the self-users of information, such as the circle of friends once established will stably live in Internet always. Therefore, the proposed method can provides the important constraint condition for the feature extraction of the multi-media data by utilizing the user relevance; (3) the proposed algorithm realizes the feature selection efficiently at real time when the complete user relevance as a precondition. In this paper, we utilize three stander multi-media datasets to verify the proposed algorithm including Sina Weibo dataset, Flicker dataset, Blog Catalog dataset from 'Datatang'. These datasets have many characteristic enhancing the difficult of the feature extraction, such as amount of users, the complex relationship of between users, various categories of users. Moreover, we compare with five popular algorithms to evaluate the performance. © 2018, Science Press. All right reserved.
引用
收藏
页码:1517 / 1535
页数:18
相关论文
共 36 条
  • [1] Gu Q.-Q., Li Z.-H., Han J.-W., Generalized fisher score for feature selection, Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence, pp. 266-273, (2010)
  • [2] Peng H.-C., Long F.-H., Ding C., Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 8, pp. 1226-1238, (2005)
  • [3] Nie F., Huang H., Cai X., Efficient and robust feature selection via joint l<sub>2, 1</sub>-norms minimization, Proceedings of the 26th International Conference on Data Engineering, pp. 1813-1821, (2010)
  • [4] Deng C., Zheng C.-Y., He X.-F., Unsupervised feature selection for multi-cluster data, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 333-343, (2010)
  • [5] Lui H., Motoda H., Computational Methods of Feature Selection, (2008)
  • [6] Nie F.-P., Xiang S.-M., Trace ratio criterion for feature selection, Proceedings of the 23rd AAAI Conference on Artificial Intelligence, pp. 671-676, (2008)
  • [7] Robnik-Sikonja M., Kononenko I., Theoretical and empirical analysis of relieff and rrelieff, Machine Learning, 53, 1, pp. 23-69, (2003)
  • [8] Dy J.G., Brodley C.E., Unsupervised feature selection applied to content-based retrieval of lung images, IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 3, pp. 373-378, (2003)
  • [9] He X.-F., Cai D., Partha N., Laplacian score for feature selection, Proceedings of the Advances in Neural Information Processing Systems, pp. 507-514, (2006)
  • [10] Zhao Z., Liu H., Spectral feature selection for supervised and unsupervised learning, Proceedings of the 24th International Conference on Machine Learning, pp. 1151-1157, (2007)