Clustering on Sparse Data in Non-Overlapping Feature Space with Applications to Cancer Subtyping

被引:2
|
作者
Kang, Tianyu [1 ]
Zarringhalam, Kourosh [2 ]
Kuijjer, Marieke [3 ]
Chen, Ping [1 ]
Quackenbush, John [3 ]
Ding, Wei [1 ]
机构
[1] Univ Massachusetts, Dept Comp Sci, Boston, MA 02125 USA
[2] Univ Massachusetts, Dept Math, Boston, MA 02125 USA
[3] Dana Farber Canc Inst, Boston, MA 02115 USA
基金
美国国家科学基金会;
关键词
Unsupervised Learning; Clustering; Artificial Neural Networks;
D O I
10.1109/ICDM.2018.00138
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a new algorithm, Reinforced and Informed Network-based Clustering (RINC), for finding unknown groups of similar data objects in sparse and largely non-overlapping feature space where a network structure among features can be observed. Sparse and non-overlapping unlabeled data become increasingly common and available especially in text mining and biomedical data mining. RINC inserts a domain informed model into a modelless neural network. In particular, our approach integrates physically meaningful feature dependencies into the neural network architecture and soft computational constraint. Our learning algorithm efficiently clusters sparse data through integrated smoothing and sparse auto-encoder learning. The informed design requires fewer samples for training and at least part of the model becomes explainable. The architecture of the reinforced network layers smooths sparse data over the network dependency in the feature space. Most importantly, through back-propagation, the weights of the reinforced smoothing layers are simultaneously constrained by the remaining sparse auto-encoder layers that set the target values to be equal to the raw inputs. Empirical results demonstrate that RINC achieves improved accuracy and renders physically meaningful clustering results.
引用
收藏
页码:1079 / 1084
页数:6
相关论文
共 50 条
  • [31] Challenges of Clustering Multimodal Clinical Data: Review of Applications in Asthma Subtyping
    Horne, Elsie
    Tibble, Holly
    Sheikh, Aziz
    Tsanas, Athanasios
    JMIR MEDICAL INFORMATICS, 2020, 8 (05)
  • [32] Improving Availability of Vertical Federated Learning: Relaxing Inference on Non-overlapping Data
    Ren, Zhenghang
    Yang, Liu
    Chen, Kai
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2022, 13 (04)
  • [33] Non-Overlapping Rings: A New Architecture for Designing Switch Clusters in Data Centers
    Akyamac, Ahmet A.
    Chu, Thomas P.
    2013 IEEE GLOBECOM WORKSHOPS (GC WKSHPS), 2013, : 414 - 419
  • [34] An Online Automatic Calibration Method Based on Feature Descriptor for Non-Overlapping Multi-Camera Systems
    Zhang, Long
    Zhang, Jingdong
    Zhang, Wen
    Zhang, Chaofan
    Liu, Yong
    2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO), 2018, : 138 - 143
  • [35] Joint reconstruction of non-overlapping magnetic particle imaging focus-field data
    Knopp, T.
    Them, K.
    Kaul, M.
    Gdaniec, N.
    PHYSICS IN MEDICINE AND BIOLOGY, 2015, 60 (08): : L15 - L21
  • [36] A Randomized Non-overlapping Encryption Scheme for Enhanced Image Security in Internet of Things (IoT) Applications
    Aqeel, Muhammad
    Jaffar, Arfan
    Faheem, Muhammad
    Ashraf, Muhammad Waqar
    Iqbal, Nadeem
    Yousaf, Shahid
    Diab, Hossam
    ENGINEERING REPORTS, 2025, 7 (01)
  • [37] Geo-spatial clustering with non-spatial attributes and geographic non-overlapping constraint: A penalized spatial distance measure
    Zhang, Bin
    Yin, Wen Jun
    Xie, Ming
    Dong, Jin
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2007, 4426 : 1072 - +
  • [38] SPARSE FEATURE EXTRACTION FOR SUPPORT VECTOR DATA DESCRIPTION APPLICATIONS
    Banerjee, Amit
    Juang, Radford
    Broadwater, Joshua
    Burlina, Philippe
    2010 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2010, : 4236 - 4239
  • [39] On the parallelization of multi-grid methods using a non-overlapping domain decomposition data structure
    Jung, M
    APPLIED NUMERICAL MATHEMATICS, 1997, 23 (01) : 119 - 137
  • [40] A searching and extraction technology of non-overlapping polygons in the land use planning base data processing
    Cao Y.
    ICETC 2010 - 2010 2nd International Conference on Education Technology and Computer, 2010, 4 : V4118 - V4121