Large-Scale Heterogeneous Feature Embedding

被引:0
|
作者
Huang, Xiao [1 ]
Song, Qingquan [1 ]
Yang, Fan [1 ]
Hu, Xia [1 ]
机构
[1] Texas A&M Univ, Dept Comp Sci & Engn, College Stn, TX 77843 USA
关键词
DIMENSIONALITY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature embedding aims to learn a low-dimensional vector representation for each instance to preserve the information in its features. These representations can benefit various off-the-shelf learning algorithms. While embedding models for a single type of features have been well-studied, real-world instances often contain multiple types of correlated features or even information within a different modality such as networks. Existing studies such as multiview learning show that it is promising to learn unified vector representations from all sources. However, high computational costs of incorporating heterogeneous information limit the applications of existing algorithms. The number of instances and dimensions of features in practice are often large. To bridge the gap, we propose a scalable framework FeatWalk, which can model and incorporate instance similarities in terms of different types of features into a unified embedding representation. To enable the scalability, FeatWalk does not directly calculate any similarity measure, but provides an alternative way to simulate the similarity-based random walks among instances to extract the local instance proximity and preserve it in a set of instance index sequences. These sequences are homogeneous with each other. A scalable word embedding algorithm is applied to them to learn a joint embedding representation of instances. Experiments on four real-world datasets demonstrate the efficiency and effectiveness of FeatWalk.
引用
收藏
页码:3878 / 3885
页数:8
相关论文
共 50 条
  • [1] Embedding Feature Selection for Large-scale Hierarchical Classification
    Naik, Azad
    Rangwala, Huzefa
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 1212 - 1221
  • [2] Large-Scale Embedding Learning in Heterogeneous Event Data
    Gui, Huan
    Liu, Jialu
    Tao, Fangbo
    Jiang, Meng
    Norick, Brandon
    Han, Jiawei
    [J]. 2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2016, : 907 - 912
  • [3] Large-Scale Feature Matching with Distributed and Heterogeneous Computing
    Mills, Steven
    Eyers, David
    Leung, Kai-Cheung
    Tang, Xiaoxin
    Huang, Zhiyi
    [J]. PROCEEDINGS OF 2013 28TH INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ 2013), 2013, : 208 - 213
  • [4] DeHIN: A Decentralized Framework for Embedding Large-Scale Heterogeneous Information Networks
    Imran, Mubashir
    Yin, Hongzhi
    Chen, Tong
    Huang, Zi
    Zheng, Kai
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (04) : 3645 - 3657
  • [5] A General Embedding Framework for Heterogeneous Information Learning in Large-Scale Networks
    Huang, Xiao
    Li, Jundong
    Zou, Na
    Hu, Xia
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2018, 12 (06)
  • [6] PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks
    Tang, Jian
    Qu, Meng
    Mei, Qiaozhu
    [J]. KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 1165 - 1174
  • [7] Heterogeneous Embedding Propagation for Large-scale E-Commerce User Alignment
    Zheng, Vincent W.
    Sha, Mo
    Li, Yuchen
    Yang, Hongxia
    Fang, Yuan
    Zhang, Zhenjie
    Tan, Kian-Lee
    Chang, Kevin Chen-Chuan
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 1434 - 1439
  • [8] Large-scale predicting protein functions through heterogeneous feature fusion
    Zheng, Rongtao
    Huang, Zhijian
    Deng, Lei
    [J]. BRIEFINGS IN BIOINFORMATICS, 2023, 24 (04)
  • [9] LARGE-SCALE CROSS-MEDIA RETRIEVAL BY HETEROGENEOUS FEATURE AUGMENTATION
    Li, Qiang
    Han, Yahong
    Dang, Jianwu
    [J]. 2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 977 - 980
  • [10] Large-scale offline signature recognition via deep neural networks and feature embedding
    Calik, Nurullah
    Kurban, Onur Can
    Yilmaz, Ali Riza
    Yildirim, Tulay
    Ata, Lutfiye Durak
    [J]. NEUROCOMPUTING, 2019, 359 : 1 - 14