Large-Scale Heterogeneous Feature Embedding

被引:0
|
作者
Huang, Xiao [1 ]
Song, Qingquan [1 ]
Yang, Fan [1 ]
Hu, Xia [1 ]
机构
[1] Texas A&M Univ, Dept Comp Sci & Engn, College Stn, TX 77843 USA
关键词
DIMENSIONALITY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature embedding aims to learn a low-dimensional vector representation for each instance to preserve the information in its features. These representations can benefit various off-the-shelf learning algorithms. While embedding models for a single type of features have been well-studied, real-world instances often contain multiple types of correlated features or even information within a different modality such as networks. Existing studies such as multiview learning show that it is promising to learn unified vector representations from all sources. However, high computational costs of incorporating heterogeneous information limit the applications of existing algorithms. The number of instances and dimensions of features in practice are often large. To bridge the gap, we propose a scalable framework FeatWalk, which can model and incorporate instance similarities in terms of different types of features into a unified embedding representation. To enable the scalability, FeatWalk does not directly calculate any similarity measure, but provides an alternative way to simulate the similarity-based random walks among instances to extract the local instance proximity and preserve it in a set of instance index sequences. These sequences are homogeneous with each other. A scalable word embedding algorithm is applied to them to learn a joint embedding representation of instances. Experiments on four real-world datasets demonstrate the efficiency and effectiveness of FeatWalk.
引用
收藏
页码:3878 / 3885
页数:8
相关论文
共 50 条
  • [31] Flash Embedding: Storing Embedding Tables in SSD for Large-Scale Recommender Systems
    Wan, Hu
    Sun, Xuan
    Cui, Yufei
    Yang, Chia-Lin
    Kuo, Tei-Wei
    Xue, Chun Jason
    [J]. APSYS '21: PROCEEDINGS OF THE 12TH ACM SIGOPS ASIA-PACIFIC WORKSHOP ON SYSTEMS, 2021, : 9 - 16
  • [32] Feature Extraction for Large-Scale Text Collections
    Gallagher, Luke
    Mallia, Antonio
    Culpepper, J. Shane
    Suel, Torsten
    Cambazoglu, B. Barla
    [J]. CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 3015 - 3022
  • [33] Performance Prediction for Large-scale Heterogeneous Platforms
    Yasudo, Ryota
    Varbanescu, Ana L.
    Coutinho, Jose G. F.
    Luk, Wayne
    Amano, Hideharu
    [J]. PROCEEDINGS 26TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2018), 2018, : 220 - 220
  • [34] Relevance Measure in Large-Scale Heterogeneous Networks
    Meng, Xiaofeng
    Shi, Chuan
    Li, Yitong
    Zhang, Lei
    Wu, Bin
    [J]. WEB TECHNOLOGIES AND APPLICATIONS, APWEB 2014, 2014, 8709 : 636 - 643
  • [35] Generating Large-Scale Heterogeneous Graphs for Benchmarking
    Gupta, Amarnath
    [J]. SPECIFYING BIG DATA BENCHMARKS, 2014, 8163 : 113 - 128
  • [36] Advanced learning for large-scale heterogeneous computing
    Zou, Quan
    Liu, Wei
    Merler, Michele
    Ji, Rongrong
    [J]. NEUROCOMPUTING, 2016, 217 : 1 - 2
  • [37] Optimized localization in large-scale heterogeneous WSN
    Kumar, Sumit
    Batra, Neera
    Kumar, Shrawan
    [J]. JOURNAL OF SUPERCOMPUTING, 2023, 79 (06): : 6705 - 6729
  • [38] Optimized localization in large-scale heterogeneous WSN
    Sumit Kumar
    Neera Batra
    Shrawan Kumar
    [J]. The Journal of Supercomputing, 2023, 79 : 6705 - 6729
  • [39] Load balancing in large-scale heterogeneous systems
    Sem Borst
    [J]. Queueing Systems, 2022, 100 : 397 - 399
  • [40] Load balancing in large-scale heterogeneous systems
    Borst, Sem
    [J]. QUEUEING SYSTEMS, 2022, 100 (3-4) : 397 - 399